Preliminary analyses for microeukaryote tag-sequence survey.

Description: Investigate the diversity of single-celled microbial eukaryotic communities across several deep-sea hydrothermal vent sites (including NE Pacific, Caribbean). We plan to address questions related to the environmental factors that shape protistan community dynamics, and determine if patterns in species diversity and distribution vary at different deep-sea habitats. These questions will be addressed using similarly generated metabarcoding data from several distinct hydrothermal vents. Along with characterizing community structure, we plan to evaluate interactions between protist species (to identify putative predator-prey or parasite-host relationships) and their environment (to explore their relationship to geochemical properties).

Questions to address

What is the general biogeography and distribution of the deep-sea hydrothermal vent microbial eukaryotic community?

What community structure features (i.e., species richness, proportion cosmopolitan versus endemic, species evenness) are shared across or unique to deep-sea hydrothermal vent sites?

What environmental features (i.e., temperature, geochemistry) influence microbial eukaryotic community diversity? Can we identify if certain environmental factors select for putative vent endemics?

1 Environmental characterization of all vent sites

Three sites in separate ocean regions.

metadata <- read.delim("data-input/samplelist-metadata.txt")
# head(metadata)

Filter to environmental data only

env_deepsea <- metadata %>% 
  filter(Sample_or_Control == "Sample") %>% 
  filter(!(SAMPLETYPE == "Incubation")) %>% 
  filter(!(SAMPLETYPE == "Microcolonizer")) %>% 
  select(VENT, COORDINATES, SITE, SAMPLEID, DEPTH, SAMPLETYPE, YEAR, TEMP, pH, PercSeawater, Mg, NO3, H2, H2S, CH4, ProkConc) %>% 
  pivot_longer(cols = TEMP:ProkConc, names_to = "VARIABLE", values_to = "VALUE") %>% 
  distinct() %>% 
  group_by(VENT, COORDINATES, SITE, DEPTH, SAMPLETYPE, YEAR, VARIABLE) %>% 
  summarise(VALUE_AVG = mean(VALUE),
            SAMPLEIDs = str_c(SAMPLEID, collapse = ", "))
  
# View(env_deepsea)
unique(env_deepsea$SITE)
## [1] "Axial"      "VonDamm"    "Piccard"    "GordaRidge"

1.1 Visualize geochemistry

geochem_1 <- env_deepsea %>% 
  filter(VARIABLE != "ProkConc") %>% 
  ggplot(aes(x = SAMPLETYPE, y = VALUE_AVG, fill = SITE, shape = SAMPLETYPE))+
  geom_jitter(size = 3) +
  facet_wrap(VARIABLE ~ ., scales = "free") +
  scale_shape_manual(values = c(21, 23, 24)) +
  scale_fill_manual(values = c("#fdbb84", "#31a354", "#ef3b2c", "#02818a")) +
  guides(fill = guide_legend(override.aes = list(shape = 21) ),
              shape = guide_legend(override.aes = list(fill = "black"))) +
  theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 14),
          plot.margin = margin(2, 1, 2, 1, "cm"),
          strip.background = element_blank()) +
  labs(x = "", y = "")
geochem_1
## Warning: Removed 231 rows containing missing values (geom_point).

geochem_2 <- env_deepsea %>% 
  filter(VARIABLE == "ProkConc") %>% 
  ggplot(aes(x = SAMPLETYPE, y = VALUE_AVG, fill = SITE, shape = SAMPLETYPE))+
  geom_jitter(size = 3) +
  facet_wrap(VARIABLE ~ ., scales = "free") +
  scale_y_log10() +
  scale_shape_manual(values = c(21, 23, 24)) +
  scale_fill_manual(values = c("#fdbb84", "#31a354", "#ef3b2c", "#02818a")) +
  guides(fill = guide_legend(override.aes = list(shape = 21) ),
              shape = guide_legend(override.aes = list(fill = "black"))) +
  theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 14),
          plot.margin = margin(2, 1, 2, 1, "cm"),
          strip.background = element_blank()) +
  labs(x = "", y = "")
geochem_2
## Warning: Removed 14 rows containing missing values (geom_point).

1.2 Geochemistry Table

Generate output table for all parameters measured

geomchem_table <- env_deepsea %>% 
  pivot_wider(names_from = "VARIABLE", values_from = "VALUE_AVG")

# write_delim(geomchem_table, path = "table-geochem-params.txt", delim = "\t")

2 Import exisiting sequence data

Datasets included: - Gorda Ridge 2019 cruise - Axial Seamount time series - 2013, 2014, & 2015 - Mid-Cayman Rise 2020 cruise

All data generated from extracted RNA, reverse transcribed to cDNA and amplified with primers that target the V4 hypervariable region on the 18S rRNA gene.

Analysis done with QIIME2, kept 40-60% of the sequences through the QC process and generated Amplicon Sequence Variants (ASVs) with DADA2. Taxonomic assignment done with vsearch using the PR2 database (v4.14) at 80% identity. See the seq-analysis directory for QIIME2 code.

2.1 Merged data from QIIME2 analysis

After determining ASVs for each sequence run, ASV tables were merged.

merged_tax <- read_delim("data-input/taxonomy.tsv", delim = "\t")
merged_asv <- read_delim("data-input/microeuk-merged-asv-table.tsv", delim = "\t", skip = 1)
# head(merged_tax)

2.1.1 Import metadata & reformat

Still want to find more metadata. As of Oct 16, have temperature and prokaryote cell/ml if available, but can add in more metadata for the MCR cruise.

metadata <- read.delim("data-input/samplelist-metadata.txt")
# head(metadata)

Remove samples from Gorda Ridge microcolonizers and from the FLP experiments (Gorda Ridge and Mid-Cayman Rise).

asv_wtax <- merged_asv %>%
  select(FeatureID = '#OTU ID', everything()) %>%
  pivot_longer(cols = !FeatureID,
               names_to = "SAMPLE", values_to = "value") %>%
  left_join(merged_tax, by = c("FeatureID" = "Feature ID")) %>%
  left_join(metadata) %>%
  filter(!grepl("Siders_", SAMPLE)) %>% 
  filter(SAMPLETYPE != "Incubation") %>% 
  filter(SAMPLETYPE != "Microcolonizer") %>%
  mutate(DATASET = case_when(
    grepl("_GR_", SAMPLE) ~ "GR",
    grepl("Gorda", SAMPLE) ~ "GR",
    grepl("_MCR_", SAMPLE) ~ "MCR",
    grepl("Axial", SAMPLE) ~ "Axial",
  TRUE ~ "Control or blank")) %>%
    separate(Taxon, c("Domain", "Supergroup",
                  "Phylum", "Class", "Order",
                  "Family", "Genus", "Species"), sep = ";", remove = FALSE) %>% 
  unite(SAMPLENAME, SITE, SAMPLETYPE, YEAR, VENT, SAMPLEID, sep = " ", remove = FALSE)

# View(asv_wtax)
# head(asv_wtax) ## Complete ASV table with full taxonomy names and annotated sample information

2.1.2 Total number of sequences

Barplots to show total number of sequences and total number of ASVs.

Total number of sequences and ASVs parallel each other. The Axial and Gorda Ridge data were run on the same sequence run, with Mid-Cayman Rise run on a separate MiSeq run - so the average number of sequences (and ASVs) varies between these two runs. A few samples have too few sequences, they will be removed below.

This newest version of PR2 has bacteria and archaea in it. Very, very few were assigned to this. Majority assigned to eukaryotes.

head(asv_wtax)
library(viridis)
plot_grid(
  # Total number of ASVs
  asv_wtax %>% 
  filter(value > 0) %>% 
  filter(Sample_or_Control == "Sample") %>% 
  ggplot(aes(x = SAMPLENAME)) +
  geom_bar(stat = "count", width = 0.9) +
  labs(y = "Total ASVs per sample", x = "") +
  coord_flip() +
    scale_y_continuous(position = "right") +
  theme_linedraw() +
  facet_grid(DATASET ~ ., scale = "free", space = "free") +
  theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 1),
        axis.text.y = element_text(angle = 0, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black")),
  asv_wtax %>% 
    filter(Sample_or_Control == "Sample") %>%
  group_by(SAMPLENAME, SITE, Domain, DATASET) %>% 
  summarise(SUM_SEQ_DOMAIN = sum(value)) %>% 
  ggplot(aes(x = SAMPLENAME, y = SUM_SEQ_DOMAIN, fill = Domain)) +
  geom_bar(stat = "identity", color = "black", width = 0.9) +
  labs(y = "Total sequences per sample", x = "") +
  coord_flip() +
  viridis::scale_fill_viridis(discrete=TRUE) +
  scale_y_continuous(position = "right") +
  theme_linedraw() +
  facet_grid(DATASET ~ ., scale = "free", space = "free") +
  theme(axis.text.x = element_text(angle = 0, hjust = 1, vjust = 1),
        axis.text.y = element_text(angle = 0, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right"),
  ncol = 2, align = c("hv"), axis = c("lr"))

2.1.3 Table reporting total ASVs and sequences

table_raw_stats <- asv_wtax %>% filter(value > 0) %>%
       group_by(SAMPLENAME, DATASET, SITE) %>%
       summarise(SEQ_SUM = sum(value),
                 ASV_COUNT = n()) %>% 
  ungroup() %>% 
  gt(
    groupname_col = c("DATASET", "SITE"),
    rowname_col = "SAMPLENAME"
  )
table_raw_stats
gtsave(table_raw_stats, filename = "seq_asv_count_nonQC.html", path = "output-tables/")

After removing contaminate ASVs below, I will set threshold of 10,000 sequences- if a sample has fewer than this, chuck it.

2.2 Decontaminate sequence library

Import sample description text file, import as phyloseq library, and remove potential contaminate ASVs and sequences. Catalog total number of ASVs and sequences removed from analysis.

# library(decontam); library(phyloseq)

2.2.1 Import as phyloseq objects

tax_matrix <- merged_tax %>% 
  select(FeatureID = `Feature ID`, Taxon) %>% 
  separate(Taxon, c("Domain", "Supergroup", 
                  "Phylum", "Class", "Order",
                  "Family", "Genus", "Species"), sep = ";", remove = FALSE) %>% 
  column_to_rownames(var = "FeatureID") %>% 
  as.matrix

asv_matrix <- merged_asv %>% 
  select(FeatureID = '#OTU ID', everything()) %>% 
  column_to_rownames(var = "FeatureID") %>% 
  as.matrix

# Align row names for each matrix
rownames(tax_matrix) <- row.names(asv_matrix)

# Set rownames of metadata table to SAMPLE information
row.names(metadata) <- metadata$SAMPLE
# Import asv and tax matrices
ASV = otu_table(asv_matrix, taxa_are_rows = TRUE)
TAX = tax_table(tax_matrix)
phylo_obj <- phyloseq(ASV, TAX)

# Import metadata as sample data in phyloseq
samplenames <- sample_data(metadata)

# join as phyloseq object
physeq_wnames = merge_phyloseq(phylo_obj, samplenames)
# colnames(ASV)

## Check
# physeq_wnames

2.2.2 Identify contaminant ASVs

# When "Control" appears in "Sample_or_Control column, this is a negative control"
sample_data(physeq_wnames)$is.neg <- sample_data(physeq_wnames)$Sample_or_Control == "Control"
# ID contaminants using Prevalence information
contam_prev <- isContaminant(physeq_wnames, 
                               method="prevalence", 
                               neg="is.neg", 
                               threshold = 0.5, normalize = TRUE) 

# Report number of ASVs IDed as contaminants
table(contam_prev$contaminant)

0.5 - this threshold will ID contaminants in all samples that are more prevalent in negative controls than in positive samples.

As of Oct 16 2021: 56 ASVs deemed to be contaminant and will be removed.

2.2.3 Remove problematic ASVs

# Subset contaminant ASVs
contams <- filter(contam_prev, contaminant == "TRUE")
list_of_contam_asvs <- as.character(row.names(contams))
# length(list_of_contam_asvs)

taxa_contam <- as.data.frame(tax_matrix) %>% 
  rownames_to_column(var = "FeatureID") %>% 
  filter(FeatureID %in% list_of_contam_asvs)
# head(taxa_contam)
# View(asv_wtax)
asv_wtax_decon <- asv_wtax %>% 
  filter(!(FeatureID %in% list_of_contam_asvs)) %>% 
  filter(!(Sample_or_Control == "Control"))

tmp_orig <- (asv_wtax %>% filter(!(Sample_or_Control == "Control")))

# Stats on lost
x <- length(unique(tmp_orig$FeatureID)); x
y <- length(unique(asv_wtax_decon$FeatureID)); y
y-x
100*((y-x)/x) # 56 total ASVs lost
a <- sum(tmp_orig$value);a #3.817 million
b <- sum(asv_wtax_decon$value);b #3.799 million 
100*((b-a)/a)
# Lost 0.47% of sequences from whole dataset.

## Subsample to clean ASVs
asv_wtax_wstats <- asv_wtax %>% 
  mutate(DECONTAM = case_when(
    FeatureID %in% list_of_contam_asvs ~ "FAIL",
    TRUE ~ "PASS"
  ))

Started with 17934 ASVs, post-decontamination, we have 17878 (a loss of 56 ASVs).

Data started with 3817219 sequences, after removing 56 ASVs, we have 3788791 total sequences. There was a total loss of 0.74% of sequences.

plot_grid(asv_wtax_wstats %>% 
  filter(value > 0) %>% 
  ggplot(aes(x = SAMPLE, fill = DECONTAM)) +
  geom_bar(stat = "count", width = 0.9, color = "black") +
  labs(y = "Total ASVs") +
  coord_flip() +
  theme_linedraw() +
  facet_grid(DATASET ~ ., scale = "free", space = "free") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "bottom"),
  asv_wtax_wstats %>% 
  group_by(SAMPLE, SITE, DECONTAM, DATASET) %>% 
  summarise(SUM_SEQ_DOMAIN = sum(value)) %>% 
  ggplot(aes(x = SAMPLE, y = SUM_SEQ_DOMAIN, fill = DECONTAM)) +
  geom_bar(stat = "identity", color = "black", width = 0.9) +
  labs(y = "Total Sequences") +
  coord_flip() +
  theme_linedraw() +
  facet_grid(DATASET ~ ., scale = "free", space = "free") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "bottom"),
  ncol = 2)

This plot shows the distribution of ASVs and sequences that failed or passed the decontamination step. Most obvious are the control samples that indicated the potentially contaminate ASVs.

3 Survey of sequences from in situ samples

# colnames(asv_wtax_wstats)
# unique(asv_wtax_wstats$SAMPLE)
sites <- c("Piccard", "VonDamm", "Axial", "GordaRidge")
asv_insitu <- asv_wtax_wstats %>% filter(Sample_or_Control != "Control") %>% 
       filter(SITE %in% sites) %>% 
       filter(!grepl("_expTf_", SAMPLE)) %>% 
       filter(value > 0) %>% 
       filter(DECONTAM == "PASS")

# Get quick stats on totals
sum(asv_insitu$value) # 3.8 million sequences
length(unique(asv_insitu$FeatureID)) #12,379 ASVs

Final in situ dataset includes 3.79 million sequences and 12,378 ASVs total.

Additional sample QC, check replicates, and determine if replicates should be averaged.

plot_grid(asv_insitu %>% 
  group_by(SAMPLENAME, VENT, DATASET, Domain) %>% 
  summarise(seqsum_var = sum(value),
            asvcount_var = n()) %>% 
  pivot_longer(ends_with("_var"), names_to = "VARIABLE") %>% 
  ggplot(aes(x = SAMPLENAME, y = value, fill = Domain)) +
    geom_bar(color = "black", stat = "identity", position = "fill") +
    facet_grid(VARIABLE ~ DATASET, space = "free", scales = "free") +
  scale_y_continuous(expand = c(0,0)) +
  theme_linedraw() +
  scale_fill_brewer(palette = "Paired") +
  theme(strip.background = element_blank(), strip.text = element_text(color = "black"),
        axis.text.x = element_text(color = "black", angle = 90, hjust = 1, vjust = 0.5),
        legend.position = "bottom"),
  asv_insitu %>% 
  group_by(SAMPLENAME, VENT, DATASET, Domain) %>% 
  summarise(seqsum_var = sum(value),
            asvcount_var = n()) %>% 
  pivot_longer(ends_with("_var"), names_to = "VARIABLE") %>% 
  ggplot(aes(x = SAMPLENAME, y = value, fill = Domain)) +
    geom_bar(color = "black", stat = "identity", position = "stack") +
    facet_grid(VARIABLE ~ DATASET, space = "free_x", scales = "free") +
  scale_y_continuous(expand = c(0,0)) +
  theme_linedraw() +
  scale_fill_brewer(palette = "Paired") +
  theme(strip.background = element_blank(), strip.text = element_text(color = "black"),
        axis.text.x = element_text(color = "black", angle = 90, hjust = 1, vjust = 0.5),
        legend.position = "bottom"),
ncol = 2)

3.1 Base bar plot - taxonomy

asv_insitu %>% 
  filter(Domain == "Eukaryota") %>%
  # unite(SampleIdentifier, VENT, SAMPLETYPE, sep = " ", remove = FALSE) %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup() %>% 
  group_by(SITE, SAMPLETYPE, VENT, Supergroup) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
    geom_bar(stat = "identity", position = "stack", color = "black", width = 0.9) +
    facet_grid(. ~ SITE + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black")) +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black"))

Repeat taxonomy barplot, but with relative abundance

asv_insitu %>% 
  filter(Domain == "Eukaryota") %>%
  # unite(SampleIdentifier, VENT, SAMPLETYPE, sep = " ", remove = FALSE) %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup() %>% 
  group_by(SITE, SAMPLETYPE, VENT, Supergroup) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
    geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
    facet_grid(. ~ SITE + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black")) +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black"))

4 Removal of samples - QC

Filter samples so that the total number of sequences is greater than 20,000 sequences.

# head(asv_insitu)
# unique(asv_insitu$Sample_or_Control)
# hist(asv_insitu$value)
tmp <- (asv_insitu %>% 
          group_by(SAMPLE, SAMPLENAME) %>% 
          summarise(SUM = sum(value)) %>% 
        filter(SUM < 20000))
toofew <- as.character(unique(tmp$SAMPLE))
toofew

Samples: Axial_Dependable_FS900_2013 and GordaRidge_BSW020_sterivex_2019_REPa removed due to too few sequences.

4.1 Sequence and ASV table summary

Final table reporting total sequences and ASVs for each sample.

asv_insitu_qc <- asv_insitu %>% 
  filter(!(SAMPLE %in% toofew)) %>% 
  filter(value > 0)

stats_seq_asv_postQC <- asv_insitu_qc %>% 
  group_by(SAMPLEID, VENT, DATASET, SITE, SAMPLETYPE, YEAR) %>%
    summarise(SEQ_SUM = sum(value),
     ASV_COUNT = n()) %>% 
  ungroup() %>% 
  gt(
    groupname_col = c("DATASET", "SITE", "YEAR"),
    rowname_col = "SAMPLEID"
  ) %>%
  tab_header(title = "Final sequence & ASV count")

stats_seq_asv_postQC
# sum(asv_insitu_qc$value)
# length(unique(asv_insitu_qc$FeatureID))
gtsave(stats_seq_asv_postQC, filename = "output-tables/seq_asv_count_postQC.html")

5 Assess ASV presence/absence

Set up analysis to classify each ASV based on distribution

# head(asv_insitu_qc)
# head(insitu_wide)
# unique(asv_insitu_qc$SAMPLETYPE)
# unique(asv_insitu_qc$SITE)

tax_asv_id <- asv_insitu_qc %>% 
  filter(value > 0) %>% #remove zero values
  select(FeatureID, SITE, SAMPLETYPE) %>% # isolate only ASVs that are PRESENT at a site and sampletype
  distinct() %>% #unique this, as presense = present in at least 1 rep (where applicable)
  unite(sample_id, SITE, SAMPLETYPE, sep = "_") %>% 
  # select(-SITE) %>% 
  # distinct() %>% 
  add_column(present = 1) %>%
  pivot_wider(names_from = sample_id, values_from = present, values_fill = 0) %>% 
  rowwise() %>% 
  mutate_at(vars(FeatureID), factor)

Is an ASV present only at the vent site? plume? or background? What about background and plume only?

library(purrr)
any_cols <- function(tax_asv_id) reduce(tax_asv_id, `|`)

asv_class <- tax_asv_id %>%
  mutate(vent = ifelse(any_cols(across(contains("_Vent"), ~ . > 0)), "VENT", ""),
         plume= ifelse(any_cols(across(contains("_Plume"), ~ . > 0)), "PLUME", ""),
         bsw = ifelse(any_cols(across(contains("_Background"), ~ . > 0)), "BSW", ""),
         ) %>% 
  unite(class_tmp, vent, plume, bsw, sep = "_", na.rm = TRUE) %>% 
  mutate(CLASS = case_when(
  class_tmp == "VENT__" ~ "Vent only",
  class_tmp == "_PLUME_" ~ "Plume only",
  class_tmp == "__BSW" ~ "Background only",
  class_tmp == "VENT__BSW" ~ "Vent & background",
  class_tmp == "VENT_PLUME_BSW" ~ "Vent, plume, & background",
  class_tmp == "VENT_PLUME_" ~ "Vent & plume",
  class_tmp == "_PLUME_BSW" ~ "Plume & background"
  )) %>% 
  select(FeatureID, CLASS) %>% distinct()

colnames(tax_asv_id)

Binary data frame with 1 indicating presence of ASV (rows) in a given sample (columns)

Depending on prevalence of ASV, assign groupings of location.

asv_class_SITE <- tax_asv_id %>%
  mutate(mcr = ifelse(any_cols(across(contains("Piccard") | contains("VonDamm"), ~ . > 0)), "MCR", ""),
         axial = ifelse(any_cols(across(contains("Axial"), ~ . > 0)), "AxS", ""),
         gr = ifelse(any_cols(across(contains("Gorda"), ~ . > 0)), "GR", "")
         ) %>% 
  unite(class_tmp, mcr, axial, gr, sep = "_", na.rm = TRUE) %>% 
  mutate(SITE_CLASS = case_when(
  class_tmp == "__GR" ~ "Gorda Ridge only",
  class_tmp == "_AxS_" ~ "Axial only",
  class_tmp == "_AxS_GR" ~ "Axial & Gorda Ridge",
  class_tmp == "MCR__" ~ "Mid-Cayman Rise",
  class_tmp == "MCR__GR" ~ "Mid-Cayman Rise & Gorda Ridge",
  class_tmp == "MCR_AxS_" ~ "Mid-Cayman Rise & Axial",
  class_tmp == "MCR_AxS_GR" ~ "All sites"
  )) %>% 
  select(FeatureID, SITE_CLASS) %>% distinct()

Combine together with original ASV table

insitu_asv_wClass <- asv_insitu_qc %>% 
  left_join(asv_class) %>% 
  left_join(asv_class_SITE)
head(insitu_asv_wClass)

Visualize the total number of ASVs in background, plume, versus background.

# head(asv_insitu_qc)
# svg("bubbles.svg", h = 4, w = 8)
asv_insitu_qc %>% 
  select(DATASET, FeatureID, SAMPLETYPE) %>% 
  group_by(DATASET, SAMPLETYPE) %>% 
    summarise(COUNT = n_distinct(FeatureID)) %>% 
  ggplot(aes(x = DATASET, y = SAMPLETYPE, fill = SAMPLETYPE)) +
    geom_point(aes(size = COUNT), shape = 21, color = "black") +
    scale_size_continuous(range = c(4,20)) +
  scale_fill_viridis_d(option = "B") +
  theme_void() +
  theme(legend.position = "right",
        axis.text = element_text(color = "black"))
# dev.off()

Bubble plot reporting the total number of ASVs found in the vent, plume, versus background. At each site, the vent protist population had a higher total number of ASVs.

Repeat visualization by ASV distribution category.

# head(insitu_asv_wClass)
insitu_asv_wClass %>% 
  select(DATASET, FeatureID, SAMPLETYPE, CLASS) %>% 
  group_by(DATASET, SAMPLETYPE, CLASS) %>% 
    summarise(COUNT = n_distinct(FeatureID)) %>% 
  ggplot(aes(x = DATASET, y = SAMPLETYPE, fill = SAMPLETYPE)) +
    geom_point(aes(size = COUNT), shape = 21, color = "black") +
    scale_size_continuous(range = c(4,20)) +
  scale_fill_viridis_d(option = "B") +
  theme_void() +
  theme(legend.position = "right",
        axis.text.x = element_text(color = "black"),
        axis.title.y = element_blank()) +
  facet_grid(SAMPLETYPE + CLASS ~ ., scales = "free", space = "free") +
  labs(x = "", y = "", title = "Total number of ASVs by distribution & sample type")

Repeated bubble plot reports the total number of ASVs in the vent, plume, and background - but now further separated by distribution (i.e., if an ASV was found only in the vent and plume = “Vent & plume”). The largest portion of ASVs were found only at the vent sites (Vent only).

Categories for ASV distribution:

unique(insitu_asv_wClass$CLASS)
unique(insitu_asv_wClass$SITE_CLASS)

6 Save working ASV table

Checkpoint to save working dataframes.

save(asv_insitu, asv_insitu_qc, insitu_asv_wClass, file = "asv-tables-processed-18102021.RData")

7 PICK UP HERE FOR ANALYSIS OF SEQ

8 Functions to explore community diversity

To explore microbial eukaryotic community diversity at all three sites, below functions have been written to pass 18S data for each site through the same analysis. This will be done for all sites together and for them individually.

Sections below highlight Axial Seamount, Mid-Cayman Rise, and Gorda Ridge data individually.

axial <- c("Axial")
mcr <- c("VonDamm", "Piccard")
gr <- c("GordaRidge")
load("asv-tables-processed-18102021.RData")

8.1 Bar plot function

Create a bar plot showing the relative sequence abundance of 18S results to the Supergroup and Phylum level. Function averages across replicates and then sums to the phylum and supergroup level. Bar plot shows the relative sequence abundance.

make_bar_relabun <- function(df, selection){
  df_out <- df %>% 
  filter(SITE %in% selection) %>% 
  filter(Domain == "Eukaryota") %>%
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup()
  supergroup <- df_out %>% 
  group_by(SITE, SAMPLETYPE, VENT, YEAR, Supergroup) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = SEQ_SUM, fill = Supergroup)) +
    geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
    facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right") +
  scale_y_continuous(expand = c(0,0)) +
  # scale_fill_brewer(palette = "Set2") +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
  labs(x = "", y = "Relative abundance")
  
  phylum <- df_out %>% 
    unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>% 
  group_by(SITE, SAMPLETYPE, VENT, YEAR, SupergroupPhylum) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = SEQ_SUM, fill = SupergroupPhylum)) +
    geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
    facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right") +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black", "white", "#969696", "#525252", "#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black", "white")) +
  labs(x = "", y = "Relative abundance")
  supergroup + phylum + patchwork::plot_layout(ncol = 1)
}

# make_bar_relabun(insitu_asv_wClass, axial)

8.2 Tile plot

Relative abundance plots are misleading, as this tag-sequence data is compositional. To combat this, we can also perform a center log-ratio transformation of the sequence counts. This tile plot (or heat map) will show the relationship from the data mean. Positive values thus demonstrate an increase in the taxa, while negative values illustrate the opposite.

Ahead of the CLR transformation, average across replicates, then sum to the Class level. THEN perform CLR transformation and plot as heat map.

make_clr_trans_tile <- function(df, selection){
  df_wide <- df %>%
    filter(SITE %in% selection) %>%
  # df_wide <- insitu_asv_wClass %>% 
    # filter(SITE %in% axial) %>% 
    filter(Domain == "Eukaryota") %>% 
  # Average across replicates
      group_by(FeatureID, SAMPLENAME, VENT, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species) %>% 
        summarise(AVG = mean(value)) %>% 
      ungroup() %>% 
    # Sum to the Order taxonomic classification
    unite(SAMPLENAME_2, SAMPLENAME, VENT, sep = "_") %>% 
      group_by(SAMPLENAME_2, Supergroup, Phylum, Class) %>% 
        summarise(CLASS_SUM = sum(AVG)) %>% 
    unite(CLASS, Supergroup, Phylum, Class, sep = " ") %>% 
    select(CLASS, SAMPLENAME_2, CLASS_SUM) %>% 
    pivot_wider(names_from = SAMPLENAME_2, values_from = CLASS_SUM, values_fill = 0) %>% 
    column_to_rownames(var = "CLASS")
  ## Take wide data frame and CLR transform, pivot to wide, and plot
  data.frame(compositions::clr(df_wide)) %>% 
      rownames_to_column(var = "CLASS") %>%
      pivot_longer(cols = starts_with(selection), values_to = "CLR", names_to = "SAMPLENAME_2") %>% 
      separate(SAMPLENAME_2, c("SAMPLENAME", "VENT"), sep = "_") %>% 
      separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
      mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>% 
      mutate(REGION = case_when(
        SITE == "GordaRidge" ~ "Gorda Ridge",
        SITE %in% mcr ~ "Mid-Cayman Rise",
        # SITE == "Piccard" ~ "Piccard",
        # SITE == "VonDamm" ~ "Von Damm",
        SITE == "Axial" ~ "Axial")) %>% 
      mutate(VENTNAME = case_when(
        REGION == "Gorda Ridge" ~ VENT,
        REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
        REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
      )) %>% select(-Sample_tmp) %>% 
    unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
    separate(CLASS, c("Supergroup", "Phylum", "Class"), sep = " ", remove = FALSE) %>%
    ggplot(aes(x = SAMPLE, y = Class, fill = CLR)) +
      geom_tile(color = "#252525") + 
      theme(legend.position = "right", 
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(), 
            panel.border = element_blank(), 
            panel.background = element_blank(),
            axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",size = 8), 
            axis.text.y = element_text(color = "black", size = 8),
            strip.background = element_blank(), 
            strip.text.y = element_text(hjust = 0, vjust = 0.5, angle = 0),
            # strip.text.x = element_blank(), 
            legend.title = element_blank()) + 
      labs(x = "", y = "") +
      scale_fill_gradient2(low = "#4575b4", mid = "white", high = "#d73027", na.value = "grey50") +
      facet_grid(Supergroup + Phylum ~ SAMPLETYPE, space = "free", scales = "free")
}

8.3 PCA

Similar to aove, the first step in this function transforms data using CLR (to ASV level though). First plot will show eigen values (scree plot to determine if 2 vs. 3 dimensions is best for data). Then function extracts data points and creates PCA plot.

make_pca <- function(df, selection){
 df_wide_asv <- df %>%
    # df_wide_asv <- insitu_asv_wClass %>%
  filter(SITE %in% selection) %>%
    # filter(SITE %in% axial) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT) %>% 
    summarise(AVG = mean(value)) %>% 
    ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, REGION, VENTNAME, sep = "_", remove = FALSE) %>% 
   group_by(FeatureID, SAMPLE) %>% 
   summarise(SUM = sum(AVG)) %>% 
  pivot_wider(names_from = FeatureID, values_from = SUM, values_fill = 0) %>% 
  column_to_rownames(var = "SAMPLE")
  # look at eigenvalues
  pca_lr <- prcomp(data.frame(compositions::clr(df_wide_asv)))
  variance_lr <- (pca_lr$sdev^2)/sum(pca_lr$sdev^2)
  ## View bar plot
  barplot(variance_lr, main = "Log-Ratio PCA Screeplot", xlab = "PC Axis", ylab = "% Variance", 
    cex.names = 1.5, cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
  ## Extract PCR points
  data.frame(pca_lr$x, SAMPLE = rownames(pca_lr$x)) %>% 
      separate(SAMPLE, c("SAMPLETYPE", "REGION", "VENTNAME"), sep = "_", remove = FALSE) %>% 
  ## Generate PCA plot
  ggplot(aes(x = PC1, y = PC2, shape = SAMPLETYPE, fill = VENTNAME)) +
    geom_hline(yintercept = 0) + geom_vline(xintercept = 0, color = "#525252") +
    geom_point(size=4, stroke = 1, aes(fill = VENTNAME)) +
    ylab(paste0('PC2 ',round(variance_lr[2]*100,2),'%')) +
    xlab(paste0('PC1 ',round(variance_lr[1]*100,2),'%')) +
    scale_shape_manual(values = c(21, 23, 24)) +
    # scale_fill_manual(values = fill_color) +
    # scale_color_manual(values = color_color) +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 14),
          plot.margin = margin(2, 1, 2, 1, "cm")) +
    guides(fill = guide_legend(override.aes = list(shape = 21) ),
              shape = guide_legend(override.aes = list(fill = "black")))
  }

8.4 ASV richness

From complete dataset, average across replicates, then sum the total number of ASVs in each sample. Then plot a data point for total number of ASVs (ASV richness) by sample type - where sample type represents the vent, plume, vs. background. Box plots show the median and range.

make_asv_rich <- function(df, selection){
  df %>% 
  filter(SITE %in% selection) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
  ungroup() %>% 
  group_by(SITE, REGION, SAMPLE, SAMPLETYPE) %>% 
    summarise(NUM_ASV = n()) %>% 
  ggplot(aes(x = SAMPLETYPE, y = NUM_ASV, shape = SAMPLETYPE)) +
  geom_boxplot(aes(group = SAMPLETYPE), alpha = 0.8, width = 0.4) +
  geom_jitter(size=2, aes(fill = SITE)) +
  scale_shape_manual(values = c(21, 23, 24)) +
  theme_bw() +
  theme(axis.text = element_text(color="black", size=12),
        legend.title = element_blank(),
        axis.title = element_text(color="black", size=14),
        legend.text = element_text(color = "black", size = 14)) +
  guides(fill = guide_legend(override.aes = list(shape = 21) ),
            shape = guide_legend(override.aes = list(fill = "black") ) ) +
  labs(x = "", y = "Total number of ASVs")
}

8.5 Presence-absence

Bar plot (colors correspond to Supergroup) represents the number of ASVs shared or unique to each sample. Combination matrix below bars shows which samples are considered for the bar plot.

make_upset_plot <- function(df, selection){ 
  df %>% 
  filter(SITE %in% selection) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SITE, VENTNAME, sep = " ", remove = FALSE) %>% 
  distinct(FeatureID, Supergroup, AVG, SAMPLE, .keep_all = TRUE) %>% 
  group_by(FeatureID, Supergroup) %>% 
  summarise(SAMPLE = list(SAMPLE)) %>% 
  ggplot(aes(x = SAMPLE)) +
    geom_bar(color = "black", width = 0.5, aes(fill = Supergroup)) +
    scale_x_upset(n_intersections = 35) +
  scale_y_continuous(expand = c(0,0)) +
  labs(x = "", y = "Shared ASVs") +
  theme_linedraw() +
  theme(axis.text = element_text(color="black", size=10),
        axis.title = element_text(color="black", size=10),
        legend.text = element_text(color = "black", size = 10),
        plot.margin = margin(1, 1, 1, 5, "cm")) +
    scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black"))
}

9 Axial Seamount analysis

  • What should I do with the year to year comparison? Individual analyses here for Axial, MCR, and GR will go in supplementary.

9.1 Bar plot relative abundance: Axial

make_bar_relabun(insitu_asv_wClass, axial)

Axial Seamount samples from archived material - span 2013, 2014, and 2015. First, the background and plume (from 2015 only, and from plume associated with the Anemone vent) are different from the vent samples - overwhelmingly stramneopile and rhizaria. For the background and plume, the stramenopiles appear to be associated with ochrophyta or opalozoa. For the plume, the rhizaria population was associated with cercozoa, while the background seawater was identified as belonging to radiolaria.

The major difference between the background/plume and vent sites was the higher relative sequence abundance of ciliates and opisthokonta. For the opisthokonta, these are primarily metazoa - and I will need to investigate this further. Exceptions for this include the ‘Dependable’ vent from 2013, which had a completely different composition, and ‘Marker 113’ in 2015, which the opisthokonta sequences were assigned choanoflagellate and fungi.

Further questions to consider

Any geochemical changes to Marker 113 from 2013/2014 to 2015? Could attribute difference of opisthokonta colonization.

Need to get metadata

9.2 Tile plot CLR: Axial

make_clr_trans_tile(insitu_asv_wClass, axial)
## Warning: Expected 4 pieces. Additional pieces discarded in 1872 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

Tile plot goes to the Class taxonomic level. Here at Axial, mostly the ciliate class had higher CLR values (more enriched relative to the data mean). Second to ciliates were cercozoa. Also noticing how Marker 113 2013 and 2015 are more similar to each other than 2014?

9.3 PCA: Axial

make_pca(insitu_asv_wClass, axial)
## Warning: Expected 4 pieces. Additional pieces discarded in 7717 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

While we only have 1 plume and bsw each for Axial, they are grouping together - away from vents. So that is an expected signature and likely consistent with the other sites. These colors are a little confusing, it does look like Boca is an outlier.

9.4 ASV richness: Axial

make_asv_rich(insitu_asv_wClass, axial)
## Warning: Expected 4 pieces. Additional pieces discarded in 7717 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

We only have 1 sample for background and plume from Axial Seamount. But this shows that the vent sites have varied ASV richness,

9.5 Presence-absence: Axial

make_upset_plot(insitu_asv_wClass, axial)
## Warning: Expected 4 pieces. Additional pieces discarded in 7717 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 876 rows containing non-finite values (stat_count).

10 Mid-Cayman Rise

10.1 Bar plot relative abundance: MCR

make_bar_relabun(insitu_asv_wClass, mcr)

10.2 Tile plot CLR: MCR

make_clr_trans_tile(insitu_asv_wClass, mcr)
## Warning: Expected 4 pieces. Additional pieces discarded in 1794 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

10.3 PCA: MCR

make_pca(insitu_asv_wClass, mcr)
## Warning: Expected 4 pieces. Additional pieces discarded in 8327 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

10.4 ASV richness: MCR

make_asv_rich(insitu_asv_wClass, mcr)
## Warning: Expected 4 pieces. Additional pieces discarded in 8327 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

10.5 Presence-absence: MCR

make_upset_plot(insitu_asv_wClass, mcr)
## Warning: Expected 4 pieces. Additional pieces discarded in 8327 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 907 rows containing non-finite values (stat_count).

Repeat presence-absence plot, but with a lower resolution.

insitu_asv_wClass %>% 
  filter(SITE %in% mcr) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SITE, SAMPLETYPE, sep = " ", remove = FALSE) %>% 
  group_by(FeatureID, Supergroup, SAMPLE) %>% 
    summarise(SUM = sum(AVG)) %>%
  ungroup() %>% 
  distinct(FeatureID, Supergroup, SUM, SAMPLE, .keep_all = TRUE) %>% 
  group_by(FeatureID, Supergroup) %>% 
  summarise(SAMPLE = list(SAMPLE)) %>% 
  ggplot(aes(x = SAMPLE)) +
    geom_bar(color = "black", width = 0.5, aes(fill = Supergroup)) +
    scale_x_upset(n_intersections = 15) +
  scale_y_continuous(expand = c(0,0)) +
  labs(x = "", y = "Shared ASVs") +
  theme_linedraw() +
  theme(axis.text = element_text(color="black", size=10),
        axis.title = element_text(color="black", size=10),
        legend.text = element_text(color = "black", size = 10),
        plot.margin = margin(1, 1, 1, 5, "cm")) +
    scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black"))
## Warning: Expected 4 pieces. Additional pieces discarded in 8327 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 331 rows containing non-finite values (stat_count).

10.6 MCR only plot to look at parameters and sequence information

import_mcr <- read_delim(file = "../../Mid-Cayman_Rise/midcayman-rise-microeuk/table-wcalc.txt", delim = "\t")
# head(import_mcr)
# mcr_metadata <- import_mcr %>% 
#   select(GRAZING_EFFECT_hr)
#   unite(type_site, "2020", _vent_, " ")
# View(asv_insitu_qc)
# unique(tmp$SAMPLENAME)

plot_bubble <- function(VARIABLE){
  asv_insitu_qc %>% 
    filter(SITE %in% mcr) %>% 
    filter(Domain == "Eukaryota") %>% 
    filter(value > 0) %>% 
  # Average across replicates
      group_by(SAMPLENAME, VENT) %>% 
        summarise(SUM = sum(value),
                  ASV_COUNT = n_distinct(FeatureID),
                  TEMP_avg = mean(TEMP),
                  PROK_avg = mean(ProkConc)) %>% 
      ungroup() %>% 
  pivot_longer(cols = c(SUM, ASV_COUNT, TEMP_avg, PROK_avg)) %>% 
  filter(name == VARIABLE) %>% 
  ggplot(aes(x = SAMPLENAME, y = name, size = value)) +
    geom_point(shape = 21, color = "black", aes(size = value)) +
    scale_size_continuous(range = c(1,16)) +
    theme_void() +
    theme(axis.text.x = element_text(color = "black", angle = 45, hjust = 1, vjust = 1),
          axis.text.y = element_text(color = "black"),
          legend.title = element_blank())
}
# plot_grid(
#   plot_bubble("ASV_COUNT") + theme(axis.text.x = element_blank()),
#   plot_bubble("SUM") + theme(axis.text.x = element_blank()),
#   plot_bubble("PROK_avg") + theme(axis.text.x = element_blank()),                           
#   plot_bubble("TEMP_avg"),
#   ncol = 1,
#   align = c("hv"),
#   axis = c("lrtb")
# )
plot_bubble("ASV_COUNT") + theme(axis.text.x = element_blank()) +
  plot_bubble("SUM") + theme(axis.text.x = element_blank()) +
  plot_bubble("PROK_avg") + theme(axis.text.x = element_blank()) + 
  plot_bubble("TEMP_avg") + patchwork::plot_layout(ncol = 1)
## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

# ?plot_grid()

11 Gorda Ridge

## Bar plot relative abundance: GR

make_bar_relabun(insitu_asv_wClass, gr)

11.1 Tile plot CLR: GR

make_clr_trans_tile(insitu_asv_wClass, gr)
## Warning: Expected 4 pieces. Additional pieces discarded in 2210 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

11.2 PCA: GR

make_pca(insitu_asv_wClass, gr)
## Warning: Expected 4 pieces. Additional pieces discarded in 9456 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

11.3 ASV richness: GR

make_asv_rich(insitu_asv_wClass, gr)
## Warning: Expected 4 pieces. Additional pieces discarded in 9456 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

11.4 Presence-absence: GR

make_upset_plot(insitu_asv_wClass, gr)
## Warning: Expected 4 pieces. Additional pieces discarded in 9456 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 926 rows containing non-finite values (stat_count).

12 Compare all three sites

all <- c("Axial", "VonDamm", "Piccard", "GordaRidge")
mcr <- c("VonDamm", "Piccard")

12.1 Bar plot relative abundance: all

make_bar_relabun(insitu_asv_wClass, all)

### Tree map - simplier taxonomic composition

library(treemapify)
# unique(tmp$SUPERGROUP)

Filter data to reduce noise and show sample type to vent ecosystem variability.

alv <- c("Alveolata-Ellobiopsidae", "Alveolata-Perkinsea", "Alveolata-Unknown", "Alveolata-Chrompodellids", "Alveolata-Apicomplexa")
bkgd <- c("Deep seawater", "BSW", "Shallow seawater")
plume <- c("Candelabra Plume", "Mt Edwards Plume", "Plume", "Near vent BW")

to_supergroup <- insitu_asv_wClass %>% 
  filter(SITE %in% all) %>% 
  filter(Domain == "Eukaryota") %>%
  filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  mutate(SUPERGROUP = case_when(
    Supergroup %in% alv ~ "Other Alveolata",
    Supergroup == "Eukaryota_X" ~ "Unknown Eukaryota",
    Phylum == "Cercozoa" ~ "Rhizaria-Cercozoa",
    Phylum == "Radiolaria" ~ "Rhizaria-Radiolaria",
    Phylum == "Ochrophyta" ~ "Stramenopiles-Ochrophyta",
    Phylum == "Opalozoa" ~ "Stramenopiles-Opalozoa",
    Phylum == "Sagenista" ~ "Stramenopiles-Sagenista",
    TRUE ~ Supergroup
  )) %>% 
  mutate(SAMPLETYPEORDER = case_when(
    VENT %in% bkgd ~ "Background",
    VENT %in% plume ~ "Plume",
    TRUE ~ "Vent"
  )) %>% 
  group_by(FeatureID, Taxon, SUPERGROUP,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET, SAMPLETYPEORDER) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup() %>% 
  group_by(SITE, SUPERGROUP, SAMPLETYPEORDER) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  filter(SEQ_SUM > 200)

to_supergroup$SAMPLETYPEORDER <- factor(to_supergroup$SAMPLETYPEORDER, levels = c("Background", "Plume", "Vent"))

# View(to_supergroup)
to_supergroup %>% 
  ggplot(aes(area = SEQ_SUM, fill = SUPERGROUP, subgroup = SUPERGROUP)) +
    geom_treemap(color = "white") +
    geom_treemap_subgroup_border(colour = "white", size = 2) +
    # geom_bar(stat = "identity", position = "fill", color = "black", width = 0.9) +
    facet_grid(SITE ~ SAMPLETYPEORDER) +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right",
        legend.title = element_blank(),
        panel.border = element_blank()) +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
  labs(x = "", y = "Sequence proportion by Supergroup")

12.2 Tile plot CLR: all

A better approach down below after isolating the vent-only ASVs.

# make_clr_trans_tile(insitu_asv_wClass, all)

12.3 PCA: all

make_pca(insitu_asv_wClass, all)
## Warning: Expected 4 pieces. Additional pieces discarded in 25500 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

12.4 PCA analysis at another level

Repeat, but color by Region and sample type.

df_wide_asv <- insitu_asv_wClass %>%
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT) %>% 
    summarise(AVG = mean(value)) %>% 
    ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, SITE, VENTNAME, sep = "_", remove = FALSE) %>% 
   group_by(FeatureID, SAMPLE) %>% 
   summarise(SUM = sum(AVG)) %>% 
  pivot_wider(names_from = FeatureID, values_from = SUM, values_fill = 0) %>% 
  column_to_rownames(var = "SAMPLE")
## Warning: Expected 4 pieces. Additional pieces discarded in 25500 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
  # look at eigenvalues
  pca_lr <- prcomp(data.frame(compositions::clr(df_wide_asv)))
  variance_lr <- (pca_lr$sdev^2)/sum(pca_lr$sdev^2)
  ## View bar plot
  barplot(variance_lr, main = "Log-Ratio PCA Screeplot", xlab = "PC Axis", ylab = "% Variance", 
    cex.names = 1.5, cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)

  ## Extract PCR points
  data.frame(pca_lr$x, SAMPLE = rownames(pca_lr$x)) %>% 
      separate(SAMPLE, c("SAMPLETYPE", "REGION", "VENTNAME"), sep = "_", remove = FALSE) %>% 
  ## Generate PCA plot
  ggplot(aes(x = PC1, y = PC2, shape = SAMPLETYPE, fill = REGION)) +
    geom_hline(yintercept = 0) + geom_vline(xintercept = 0, color = "#525252") +
    geom_point(size=3, stroke = 1, aes(fill = REGION)) +
    ylab(paste0('PC2 ',round(variance_lr[2]*100,2),'%')) +
    xlab(paste0('PC1 ',round(variance_lr[1]*100,2),'%')) +
    scale_shape_manual(values = c(21, 23, 24)) +
    scale_fill_manual(values = c("#fdbb84", "#31a354", "#ef3b2c", "#02818a")) +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 14),
          plot.margin = margin(2, 1, 2, 1, "cm")) +
    guides(fill = guide_legend(override.aes = list(shape = 21) ),
              shape = guide_legend(override.aes = list(fill = "black")))

12.5 PCoA: ilr transformed to Euclidean dist

# head(df_wide_asv)

12.6 Dendrogram comparison

Modify sample names for dendrogram plot.

df <- as.data.frame(t(df_wide_asv))
###
  colnames(df) <- gsub(x = names(df), pattern = "_", replacement = " ")
  colnames(df) <- gsub(x = names(df), pattern = "Vent Axial", replacement = "Axial")
  colnames(df) <- gsub(x = names(df), pattern = "Vent GordaRidge", replacement = "Gorda Ridge")
  colnames(df) <- gsub(x = names(df), pattern = "GordaRidge", replacement = "Gorda Ridge")
  colnames(df) <- gsub(x = names(df), pattern = "Plume ", replacement = "")
  colnames(df) <- gsub(x = names(df), pattern = "Vent VonDamm VonDamm", replacement = "Von Damm")
  colnames(df) <- gsub(x = names(df), pattern = "Vent Piccard Piccard", replacement = "Piccard")
  colnames(df) <- gsub(x = names(df), pattern = "Background GordaRidge", replacement = "Gorda Ridge")
  colnames(df) <- gsub(x = names(df), pattern = "VonDamm VonDamm", replacement = "Von Damm")
  colnames(df) <- gsub(x = names(df), pattern = "Piccard Piccard", replacement = "Piccard")
  colnames(df) <- gsub(x = names(df), pattern = " BSW", replacement = "")
  colnames(df) <- gsub(x = names(df), pattern = "Background Axial Deep seawater 2015", replacement = "Background Axial 2015")

# Write over same data frame - fix sample names
dendro_input <- df
# head(dendro_input)

Estimate Jaccard distance

# ?vegan::decostand
# ?vegdist
dendro_jacc <- vegan::vegdist(t(dendro_input), method = "jaccard")
# head(dendro_jacc)

cluster_jacc <- hclust(dist(t(dendro_jacc)), method = "average")

library(ggdendro)
dendro_plot_df <- ggdendro::dendro_data(as.dendrogram(cluster_jacc), type = "rectangle")
label_dendro_order <- as.character(dendro_plot_df$labels$label)
# label_dendro_order

Plot dendrogram

dendro_plot_output <- ggplot(segment(dendro_plot_df)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  coord_flip() + 
  scale_y_reverse(expand = c(0.2, 0.5), breaks = c(0, 0.2, 0.4, 0.6, 0.8)) + 
  geom_text(aes(x = x, y = y, label = label, angle = 0, hjust = 0), data = label(dendro_plot_df)) + 
  theme_dendro() + 
  labs(y = "Dissimilarity", title = "Jaccard distance") + 
  theme(axis.text.x = element_text(color = "black", size = 14), 
        axis.line.x = element_line(color = "#252525"), 
        axis.ticks.x = element_line(), axis.title.x = element_text(color = "black", size = 14),
        plot.margin = margin(1, 1, 1, 1, "cm"))

Add bar plot in the same order to show proportion of resident versus cosmpolitan ASVs in each sample.

12.6.1 Bar plot to pair with dendrogram

# head(insitu_asv_wClass)
# unique(insitu_asv_wClass$SITE_CLASS)
# unique(insitu_asv_wClass$CLASS)

dendro_bar <- insitu_asv_wClass %>% 
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, SITE_CLASS) %>% 
    summarise(AVG = mean(value)) %>% 
    ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, SITE, VENTNAME, sep = "_", remove = FALSE) %>%
  group_by(SITE_CLASS, SAMPLE) %>% 
  summarise(SEQ_SUM = sum(AVG),
     ASV_COUNT = n()) %>% 
      mutate(SAMPLE = gsub(SAMPLE, pattern = "_", replacement = " ")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Axial", replacement = "Axial")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Plume ", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = " BSW", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background Axial Deep seawater 2015", replacement = "Background Axial 2015"))
## Warning: Expected 4 pieces. Additional pieces discarded in 25500 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# unique(insitu_asv_wClass$CLASS)
cosmo <- c("Vent, plume, & background", "Vent & background", "Vent & plume", "Plume & background")
dendro_res_cos_df <- insitu_asv_wClass %>% 
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  mutate(DISTRIBUTION = case_when(
    CLASS %in% cosmo ~ "Cosmopolitan",
    TRUE ~ CLASS
  )) %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, DISTRIBUTION) %>% 
    summarise(AVG = mean(value)) %>% 
    ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, SITE, VENTNAME, sep = "_", remove = FALSE) %>%
  group_by(DISTRIBUTION, SAMPLE) %>% 
  summarise(SEQ_SUM = sum(AVG),
     ASV_COUNT = n()) %>% 
      mutate(SAMPLE = gsub(SAMPLE, pattern = "_", replacement = " ")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Axial", replacement = "Axial")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Plume ", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = " BSW", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background Axial Deep seawater 2015", replacement = "Background Axial 2015"))
## Warning: Expected 4 pieces. Additional pieces discarded in 25500 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
dendro_bar$SAMPLE_ORDER <- factor(dendro_bar$SAMPLE, levels = label_dendro_order) 
dendro_res_cos_df$SAMPLE_ORDER <- factor(dendro_res_cos_df$SAMPLE, levels = label_dendro_order)
# head(dendro_bar)
dendro_bar_plot <- ggplot(dendro_bar, aes(x = SAMPLE_ORDER, y = ASV_COUNT, fill = SITE_CLASS)) +
  geom_bar(stat = "identity", position = "fill", color = "black") +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("grey", "#e6550d", "#fdbb84", "#31a354", "#1c9099", "#fde0dd", "#c51b8a")) +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 12),
          plot.margin = margin(1, 1, 1, 1, "cm"),
          legend.position = "top") +
  labs(x = "", y = "Proportion of ASVs")

dendro_bar_plot_res_cos <- ggplot(dendro_res_cos_df, aes(x = SAMPLE_ORDER, y = ASV_COUNT, fill = DISTRIBUTION)) +
  geom_bar(stat = "identity", position = "fill", color = "black") +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  viridis::scale_fill_viridis(discrete=TRUE, option = "H") +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 12),
          plot.margin = margin(1, 1, 1, 1, "cm"),
          legend.position = "top") +
  labs(x = "", y = "Proportion of ASVs")
# dendro_bar_plot_res_cos
# ?scale_fill_viridis

dendro_bar_plot_SEQ <- ggplot(dendro_bar, aes(x = SAMPLE_ORDER, y = SEQ_SUM, fill = SITE_CLASS)) +
  geom_bar(stat = "identity", position = "fill", color = "black") +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("grey", "#e6550d", "#fdbb84", "#31a354", "#1c9099", "#fde0dd", "#c51b8a")) +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 12),
          plot.margin = margin(1, 1, 1, 1, "cm"),
          legend.position = "top") +
  labs(x = "", y = "Proportion of sequences")

dendro_bar_plot_res_cos_SEQ <- ggplot(dendro_res_cos_df, aes(x = SAMPLE_ORDER, y = SEQ_SUM, fill = DISTRIBUTION)) +
  geom_bar(stat = "identity", position = "fill", color = "black") +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  viridis::scale_fill_viridis(discrete=TRUE, option = "H") +
    theme_bw() +
    theme(axis.text = element_text(color="black", size=12),
          legend.title = element_blank(),
          axis.title = element_text(color="black", size=14),
          legend.text = element_text(color = "black", size = 12),
          plot.margin = margin(1, 1, 1, 1, "cm"),
          legend.position = "top") +
  labs(x = "", y = "Proportion of sequences")
# svg("dendro_prop_asv.svg", h = 12, w = 22)
plot_grid(dendro_plot_output, 
          dendro_bar_plot,
          dendro_bar_plot_res_cos,
          align = c("v"),
          axis = c("tb"),
          nrow = 1,
          greedy = FALSE,
          rel_widths = c(1, 0.6, 0.6))

# dev.off()

Create the same plot, but by sequence proportion.

plot_grid(dendro_plot_output, 
          dendro_bar_plot_SEQ,
          dendro_bar_plot_res_cos_SEQ,
          align = c("v"),
          axis = c("tb"),
          nrow = 1,
          greedy = FALSE,
          rel_widths = c(1, 0.6, 0.6))

12.7 ASV richness: all

# make_asv_rich(insitu_asv_wClass, all)

ASV richness with customized color schema

insitu_asv_wClass %>% 
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
  ungroup() %>% 
  group_by(SITE, REGION, SAMPLE, SAMPLETYPE) %>% 
    summarise(NUM_ASV = n()) %>% 
  ggplot(aes(x = SAMPLETYPE, y = NUM_ASV, shape = SAMPLETYPE)) +
  geom_boxplot(aes(group = SAMPLETYPE), alpha = 0.8, width = 0.4) +
  geom_jitter(size=2, width = 0.3, aes(fill = SITE)) +
  scale_shape_manual(values = c(21, 23, 24)) +
    scale_fill_manual(values = c("#fdbb84", "#31a354", "#ef3b2c", "#02818a")) +
  theme_bw() +
  theme(axis.text = element_text(color="black", size=12),
        legend.title = element_blank(),
        axis.title = element_text(color="black", size=14),
        legend.text = element_text(color = "black", size = 14)) +
  guides(fill = guide_legend(override.aes = list(shape = 21) ),
            shape = guide_legend(override.aes = list(fill = NA) ) ) +
  labs(x = "", y = "Total number of ASVs")
## Warning: Expected 4 pieces. Additional pieces discarded in 25500 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

12.8 Presence-absence: all

# make_upset_plot(insitu_asv_wClass, all)
# head(insitu_asv_wClass)

12.8.1 UpsetR by ASV level

Repeat above plot, but resolve by sample location and sample type.

alv <- c("Alveolata-Ellobiopsidae", "Alveolata-Perkinsea", "Alveolata-Unknown", "Alveolata-Chrompodellids", "Alveolata-Apicomplexa")
# svg("upsetR-bysite-sampletype-nov2.svg", h=9, w=15)
insitu_asv_wClass %>% 
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  mutate(SUPERGROUP = case_when(
    Supergroup %in% alv ~ "Other Alveolata",
    Supergroup == "Eukaryota_X" ~ "Unknown Eukaryota",
    Phylum == "Cercozoa" ~ "Rhizaria-Cercozoa",
    Phylum == "Radiolaria" ~ "Rhizaria-Radiolaria",
    Phylum == "Ochrophyta" ~ "Stramenopiles-Ochrophyta",
    Phylum == "Opalozoa" ~ "Stramenopiles-Opalozoa",
    Phylum == "Sagenista" ~ "Stramenopiles-Sagenista",
    TRUE ~ Supergroup
  )) %>% 
  # Taxa to supergroup
  mutate(SupergroupPhylum = SUPERGROUP) %>%
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, SupergroupPhylum) %>% 
    summarise(AVG = mean(value)) %>% 
  ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SITE, SAMPLETYPE, sep = " ", remove = FALSE) %>% 
  group_by(FeatureID, SupergroupPhylum, SAMPLE) %>% 
    summarise(SUM = sum(AVG)) %>%
  # filter(SUM > 200) %>% 
  ungroup() %>% 
  distinct(FeatureID, SupergroupPhylum, SUM, SAMPLE, .keep_all = TRUE) %>% 
  group_by(FeatureID, SupergroupPhylum) %>% 
  summarise(SAMPLE = list(SAMPLE)) %>% 
  ggplot(aes(x = SAMPLE)) +
    geom_bar(color = "black", width = 0.5, aes(fill = SupergroupPhylum)) +
    scale_x_upset(n_intersections = 25) +
  scale_y_continuous(expand = c(0,0)) +
  labs(x = "", y = "Shared ASVs") +
  theme_linedraw() +
  theme(axis.text.y = element_text(color="black", size=14),
        axis.text.x = element_text(color="black", size=14),
        axis.title = element_text(color="black", size=14),
        legend.text = element_text(color = "black", size = 12),
        plot.margin = margin(1, 1, 1, 5, "cm")) +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#deebf7", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525"))
## Warning: Expected 4 pieces. Additional pieces discarded in 23244 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 1425 rows containing non-finite values (stat_count).

# dev.off()

Observations regarding above plot: - Axial and Gorda Ridge vent sites have more shared ASVs than any other pairwise comparison. After this, there were also many ASVs shared throughout MCR (vent, plume, + background). May be a reflection of sample size, as MCR had more vent sites - a small subset of ASVs were found at all vent sites or all samples. - ASVs within the vents had much higher unique # of ASVs (not shared with another habitat type) than any other sample type/location (furtherest left bars).

12.8.2 UpsetR at Genus level

Repeat upsetR plot, but summarize at genus level, rather than “species” or “strain”

head(insitu_asv_wClass)
## # A tibble: 6 × 37
##   FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
##   <chr>     <chr>  <dbl> <chr> <chr>  <chr>      <chr>  <chr> <chr> <chr>  <chr>
## 1 00056209… Gorda…     8 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 2 00056209… Gorda…    13 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 3 00096455… Gorda…    91 Euka… Eukar… Rhizaria   Radio… Acan… <NA>  <NA>   <NA> 
## 4 000ee377… Axial…   282 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial…    32 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda…     1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 26 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## #   VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## #   SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## #   pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## #   CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## #   DECONTAM <chr>, CLASS <chr>, SITE_CLASS <chr>
alv <- c("Alveolata-Ellobiopsidae", "Alveolata-Perkinsea", "Alveolata-Unknown", "Alveolata-Chrompodellids", "Alveolata-Apicomplexa")
# svg("upsetR-bysite-sampletype-nov2.svg", h=9, w=15)
insitu_asv_wClass %>% 
  filter(SITE %in% all) %>%
  filter(Domain == "Eukaryota") %>% 
  filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  mutate(SUPERGROUP = case_when(
    Supergroup %in% alv ~ "Other Alveolata",
    Supergroup == "Eukaryota_X" ~ "Unknown Eukaryota",
    Phylum == "Cercozoa" ~ "Rhizaria-Cercozoa",
    Phylum == "Radiolaria" ~ "Rhizaria-Radiolaria",
    Phylum == "Ochrophyta" ~ "Stramenopiles-Ochrophyta",
    Phylum == "Opalozoa" ~ "Stramenopiles-Opalozoa",
    Phylum == "Sagenista" ~ "Stramenopiles-Sagenista",
    TRUE ~ Supergroup
  )) %>% 
  # Taxa to supergroup
  mutate(SupergroupPhylum = SUPERGROUP) %>%
  unite(GENUS, Domain:Genus, sep = ";") %>% 
  # Average across replicates
    group_by(GENUS, SAMPLENAME, VENT, SupergroupPhylum) %>% 
    summarise(AVG = mean(value)) %>% 
  ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SITE, SAMPLETYPE, sep = " ", remove = FALSE) %>% 
  group_by(GENUS, SupergroupPhylum, SAMPLE) %>% 
    summarise(SUM = sum(AVG)) %>%
  # filter(SUM > 200) %>% 
  ungroup() %>% 
  distinct(GENUS, SupergroupPhylum, SUM, SAMPLE, .keep_all = TRUE) %>% 
  group_by(GENUS, SupergroupPhylum) %>% 
  summarise(SAMPLE = list(SAMPLE)) %>% 
  ggplot(aes(x = SAMPLE)) +
    geom_bar(color = "black", width = 0.5, aes(fill = SupergroupPhylum)) +
    scale_x_upset(n_intersections = 25) +
  scale_y_continuous(expand = c(0,0)) +
  labs(x = "", y = "Shared ASVs at Genus level") +
  theme_linedraw() +
  theme(axis.text.y = element_text(color="black", size=14),
        axis.text.x = element_text(color="black", size=14),
        axis.title = element_text(color="black", size=14),
        legend.text = element_text(color = "black", size = 12),
        plot.margin = margin(1, 1, 1, 5, "cm")) +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#deebf7", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525"))
## Warning: Expected 4 pieces. Additional pieces discarded in 8229 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Removed 316 rows containing non-finite values (stat_count).

13 Plot ASV prevalence by sequence?

# Not sure if I need this
tax_key <- insitu_asv_wClass %>% 
  select(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species, CLASS, SITE_CLASS) %>% 
  distinct()
# head(insitu_asv_wClass)
alv <- c("Alveolata-Ellobiopsidae", "Alveolata-Perkinsea", "Alveolata-Unknown", "Alveolata-Chrompodellids", "Alveolata-Apicomplexa")

tmp <- insitu_asv_wClass %>% 
  filter(Domain == "Eukaryota") %>% 
  filter(!is.na(Supergroup)) %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT) %>% 
    summarise(AVG = mean(value)) %>% 
  ungroup() %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
  group_by(FeatureID, SAMPLE) %>% 
  summarise(SUM = sum(AVG)) %>% 
  pivot_wider(names_from = "SAMPLE", values_from = SUM, values_fill = 0) %>% 
  column_to_rownames(var = "FeatureID") %>%
  mutate(PREVALENCE = rowSums(. > 0),
         SEQ_TOTAL = rowSums(.)) %>% 
  rownames_to_column(var = "FeatureID") %>% 
  left_join(tax_key) %>% 
   filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  mutate(SUPERGROUP = case_when(
    Supergroup %in% alv ~ "Other Alveolata",
    Supergroup == "Eukaryota_X" ~ "Unknown Eukaryota",
    Phylum == "Cercozoa" ~ "Rhizaria-Cercozoa",
    Phylum == "Radiolaria" ~ "Rhizaria-Radiolaria",
    Phylum == "Ochrophyta" ~ "Stramenopiles-Ochrophyta",
    Phylum == "Opalozoa" ~ "Stramenopiles-Opalozoa",
    Phylum == "Sagenista" ~ "Stramenopiles-Sagenista",
    TRUE ~ Supergroup
  ))
## Warning: Expected 4 pieces. Additional pieces discarded in 25354 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# head(tmp)
tmp %>% 
  filter(SEQ_TOTAL > 0) %>% 
  ggplot(aes(x = PREVALENCE, y = SEQ_TOTAL, fill = SUPERGROUP)) +
  geom_jitter(stat = "identity", shape = 21) +
  scale_y_log10() +
  facet_wrap(SUPERGROUP ~ .) +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#deebf7", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
  theme_linedraw() +
  labs(x = "Number of samples ASV appears in", y = "Total sequences (log)")

14 Compare resident vs. cosmopolitan taxa

head(insitu_asv_wClass) # from above, where I've classified each ASV by site and occurence in sample type
## # A tibble: 6 × 37
##   FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
##   <chr>     <chr>  <dbl> <chr> <chr>  <chr>      <chr>  <chr> <chr> <chr>  <chr>
## 1 00056209… Gorda…     8 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 2 00056209… Gorda…    13 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 3 00096455… Gorda…    91 Euka… Eukar… Rhizaria   Radio… Acan… <NA>  <NA>   <NA> 
## 4 000ee377… Axial…   282 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial…    32 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda…     1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 26 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## #   VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## #   SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## #   pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## #   CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## #   DECONTAM <chr>, CLASS <chr>, SITE_CLASS <chr>
unique(insitu_asv_wClass$CLASS)
## [1] "Vent only"                 "Background only"          
## [3] "Vent & background"         "Vent, plume, & background"
## [5] "Plume only"                "Vent & plume"             
## [7] "Plume & background"
unique(insitu_asv_wClass$SITE_CLASS)
## [1] "Gorda Ridge only"              "Axial only"                   
## [3] "Mid-Cayman Rise"               "Mid-Cayman Rise & Axial"      
## [5] "Axial & Gorda Ridge"           "Mid-Cayman Rise & Gorda Ridge"
## [7] "All sites"
unique(insitu_asv_wClass$SAMPLETYPE)
## [1] "Vent"       "Background" "Plume"
tmp <- (insitu_asv_wClass %>% 
          filter(DATASET == "MCR") %>% 
       group_by(CLASS) %>% 
       summarise(SEQ = sum(value),
                      COUNT = n()))
tmp
## # A tibble: 7 × 3
##   CLASS                        SEQ COUNT
##   <chr>                      <dbl> <int>
## 1 Background only            16529   383
## 2 Plume & background         93779   250
## 3 Plume only                 24647   489
## 4 Vent & background         181232   818
## 5 Vent & plume              217335   478
## 6 Vent only                 587304  3953
## 7 Vent, plume, & background 954469  3006
# Vent only
587304/sum(tmp$SEQ) #33%
## [1] 0.2829978
3953/sum(tmp$COUNT)
## [1] 0.4215634
# Cosmo
954469/sum(tmp$SEQ)
## [1] 0.4599197
3006/sum(tmp$COUNT)
## [1] 0.3205716

33% of sequences were vent-only 42% of ASVs were vent-only

45% of sequences were cosmopolitan 32% of ASVs were cosmopolitan

What Supergroups are associated with resident vs. endemic? what about to specific sites?

make_bar_bycategory <- function(df, category, position){
  CATEGORY <- enquo(category)
  df_out <- df %>% 
  filter(Domain == "Eukaryota") %>%
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET, !!CATEGORY) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup()
  ## Supergroup
  supergroup <- df_out %>% 
  group_by(Supergroup, !!CATEGORY) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = !!CATEGORY, y = SEQ_SUM, fill = Supergroup)) +
    geom_bar(stat = "identity", position = position, color = "black", width = 0.9) +
    # facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right") +
  scale_y_continuous(expand = c(0,0)) +
  # scale_fill_brewer(palette = "Set2") +
  scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525", "#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
  labs(x = "", y = "Relative abundance")
  ## Phylum
  phylum <- df_out %>% 
    unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>% 
  group_by(SupergroupPhylum, !!CATEGORY) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = !!CATEGORY, y = SEQ_SUM, fill = SupergroupPhylum)) +
    geom_bar(stat = "identity", position = position, color = "black", width = 0.9) +
    # facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right") +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black", "white", "#969696", "#525252", "#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c", 
    "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45", 
    "#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc", 
    "#807dba", "#54278f", "#bdbdbd", "black", "white")) +
  labs(x = "", y = "Relative abundance")
  supergroup + phylum + patchwork::plot_layout(ncol = 1)
}
make_tile_bycategory <- function(df, category, position){
  CATEGORY <- enquo(category)
  df_out <- df %>% 
  filter(Domain == "Eukaryota") %>%
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET, !!CATEGORY) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup()
  ## Supergroup
  supergroup <- df_out %>% 
  group_by(Supergroup, !!CATEGORY) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = !!CATEGORY, fill = log(SEQ_SUM), y = Supergroup)) +
    geom_tile(color = "black") +
    # facet_grid(. ~ SITE +YEAR + SAMPLETYPE, scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
        strip.background = element_blank(), strip.text = element_text(color = "black"),
        legend.position = "right") +
  scale_fill_gradient(low = "#ffeda0", high = "#e31a1c", na.value = "grey50") +
  labs(x = "Distribution", y = "")
  supergroup
}
make_tile_bycategory(insitu_asv_wClass, CLASS, "fill")

# make_bar_bycategory(insitu_asv_wClass, CLASS, "fill")
make_bar_bycategory(insitu_asv_wClass, CLASS, "stack")

make_bar_bycategory(insitu_asv_wClass, SITE_CLASS, "fill")

make_tile_bycategory(insitu_asv_wClass, SITE_CLASS, "fill")

make_bar_bycategory(insitu_asv_wClass, SITE_CLASS, "stack")

head(insitu_asv_wClass)
## # A tibble: 6 × 37
##   FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
##   <chr>     <chr>  <dbl> <chr> <chr>  <chr>      <chr>  <chr> <chr> <chr>  <chr>
## 1 00056209… Gorda…     8 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 2 00056209… Gorda…    13 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 3 00096455… Gorda…    91 Euka… Eukar… Rhizaria   Radio… Acan… <NA>  <NA>   <NA> 
## 4 000ee377… Axial…   282 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial…    32 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda…     1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 26 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## #   VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## #   SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## #   pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## #   CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## #   DECONTAM <chr>, CLASS <chr>, SITE_CLASS <chr>
categories <- c("Vent only", "Vent, plume, & background")
insitu_asv_wClass %>% 
  filter(CLASS %in% categories) %>% 
  mutate(CAT = case_when(
    CLASS == "Vent only" ~ "Resident",
    TRUE ~ "Cosmopolitan"
  )) %>% 
  group_by(CAT) %>% 
  summarise(SUM = sum(value),
            COUNT = n()) %>% 
  pivot_longer(c(SUM, COUNT)) %>% 
ggplot(aes(x = name, y = value, fill = CAT))+
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  theme_linedraw() +
  facet_grid(name ~ ., scales = "free")+
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.title = element_blank()) +
  labs(x = "", y = "Total number of ASVs")

14.1 Isolate shared ASVs among all

all_vents <- insitu_asv_wClass %>% 
  filter(SITE_CLASS == "All sites" & grepl("Vent", CLASS)) %>% 
  filter(SAMPLETYPE == "Vent")
# View(all_vents)
# head(insitu_asv_wClass)

Presence/absence of ASVs to the Class level across all vents.

# svg("pa-ventsites.svg", w=8, h=12)
insitu_asv_wClass %>% 
  filter(grepl("Vent", CLASS)) %>% 
  filter(Domain == "Eukaryota") %>% 
  filter(SAMPLETYPE == "Vent") %>% 
  filter(!is.na(Class)) %>% 
  filter(Phylum != "Stramenopiles_X" & Phylum != "Pseudofungi" & Phylum != "Chrompodellids") %>%
  filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup() %>% 
  mutate(SITE_2 = case_when(
    SITE == "Piccard" ~ "MCR",
    SITE == "VonDamm" ~ "MCR",
    TRUE ~ SITE
  )) %>% 
  group_by(SITE_2, SAMPLETYPE, VENT, Supergroup, Class, Order) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = Class, fill = log(SEQ_SUM))) +
    geom_tile(stat = "identity", color = "white", width = 0.9) +
    facet_grid(Supergroup ~ SITE_2 + SAMPLETYPE , scale = "free", space = "free") +
  scale_fill_gradient(low = "#ffffcc", high = "#bd0026", na.value = "grey50") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5),
        strip.background = element_blank(), strip.text.x = element_text(color = "black", angle = 0),
        panel.grid = element_blank(), strip.text.y = element_text(color = "black", angle = 0))

# dev.off()

What is the vent signature across the different sites?

Repeat above plot, isolate ciliate, dinos, rhizaria, and stramenopiles.

phyla <- c("Alveolata-Ciliophora", "Alveolata-Dinoflagellata", "Rhizaria", "Stramenopiles")

# svg("pa-ventsites-selected.svg", w=8, h=10)
insitu_asv_wClass %>% 
  filter(grepl("Vent", CLASS)) %>% 
  filter(Domain == "Eukaryota") %>% 
  filter(SAMPLETYPE == "Vent") %>% 
  filter(!is.na(Class)) %>% 
  filter(Phylum != "Stramenopiles_X" & Phylum != "Pseudofungi" & Phylum != "Chrompodellids") %>%
  filter(Supergroup != "Opisthokonta") %>% 
  mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  filter(Supergroup %in% phyla) %>% 
  group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
           VENT, SITE, SAMPLETYPE, YEAR, DATASET) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
  ungroup() %>% 
  mutate(SITE_2 = case_when(
    SITE == "Piccard" ~ "MCR",
    SITE == "VonDamm" ~ "MCR",
    TRUE ~ SITE
  )) %>% 
  group_by(SITE_2, SAMPLETYPE, VENT, Supergroup, Class) %>% 
    summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
  ggplot(aes(x = VENT, y = Class, fill = SEQ_SUM)) +
    geom_tile(stat = "identity", color = "white", width = 0.9, fill = "black") +
    facet_grid(Supergroup ~ SITE_2 + SAMPLETYPE , scale = "free", space = "free") +
  theme_linedraw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5),
        strip.background = element_blank(), strip.text.x = element_text(color = "black", angle = 0),
        panel.grid = element_blank(), strip.text.y = element_text(color = "black", angle = 0))

# dev.off()

14.2 Among resident ASVs, what is distribution?

head(insitu_asv_wClass)
## # A tibble: 6 × 37
##   FeatureID SAMPLE value Taxon Domain Supergroup Phylum Class Order Family Genus
##   <chr>     <chr>  <dbl> <chr> <chr>  <chr>      <chr>  <chr> <chr> <chr>  <chr>
## 1 00056209… Gorda…     8 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 2 00056209… Gorda…    13 Euka… Eukar… Stramenop… Sagen… <NA>  <NA>  <NA>   <NA> 
## 3 00096455… Gorda…    91 Euka… Eukar… Rhizaria   Radio… Acan… <NA>  <NA>   <NA> 
## 4 000ee377… Axial…   282 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 5 000ee377… Axial…    32 Euka… Eukar… Alveolata  Cilio… Nass… Nass… Disco… NASS…
## 6 00165708… Gorda…     1 Euka… Eukar… Stramenop… Ochro… Pela… Pela… Pelag… Pela…
## # … with 26 more variables: Species <chr>, Consensus <dbl>, SAMPLENAME <chr>,
## #   VENT <chr>, COORDINATES <chr>, SITE <chr>, Sample_or_Control <chr>,
## #   SAMPLEID <chr>, DEPTH <chr>, SAMPLETYPE <chr>, YEAR <int>, TEMP <dbl>,
## #   pH <dbl>, PercSeawater <dbl>, Mg <dbl>, NO3 <dbl>, H2 <dbl>, H2S <dbl>,
## #   CH4 <dbl>, ProkConc <dbl>, Sample_actual <chr>, Type <chr>, DATASET <chr>,
## #   DECONTAM <chr>, CLASS <chr>, SITE_CLASS <chr>
unique(insitu_asv_wClass$CLASS)
## [1] "Vent only"                 "Background only"          
## [3] "Vent & background"         "Vent, plume, & background"
## [5] "Plume only"                "Vent & plume"             
## [7] "Plume & background"
unique(insitu_asv_wClass$SITE_CLASS)
## [1] "Gorda Ridge only"              "Axial only"                   
## [3] "Mid-Cayman Rise"               "Mid-Cayman Rise & Axial"      
## [5] "Axial & Gorda Ridge"           "Mid-Cayman Rise & Gorda Ridge"
## [7] "All sites"
unique(insitu_asv_wClass$SAMPLETYPE)
## [1] "Vent"       "Background" "Plume"
# head(insitu_asv_wClass)
insitu_asv_wClass %>% 
  # filter(SITE %in% selection) %>%
  filter(Domain == "Eukaryota") %>% 
  filter(!is.na(Supergroup)) %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup, Phylum, CLASS, SITE_CLASS) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SITE, VENTNAME, sep = " ", remove = FALSE) %>% 
  # filter(CLASS == "Vent only") %>%
  group_by(Supergroup, CLASS) %>% 
    summarise(SEQ_SUM = sum(AVG),
     ASV_COUNT = n()) %>% 
  pivot_longer(cols = c(SEQ_SUM, ASV_COUNT)) %>%
  filter(name == "SEQ_SUM") %>%
  ggplot(aes(x = CLASS, y = value, fill = Supergroup)) +
    geom_hline(yintercept = 0) +
    geom_segment(aes(x = CLASS, xend = CLASS,
                     y = 0, yend = value, color = Supergroup),
                 lineend = "butt", size = 1) +
    geom_point(size = 2, shape = 19, aes(color = Supergroup)) +
    scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
    scale_color_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
    theme_bw() +
    facet_grid(. ~ Supergroup, scales = "free") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, color = "black", size = 11),
          axis.text.y = element_text(color = "black", size = 12),
        panel.spacing.x = unit(0, "lines"),panel.spacing.y = unit(0, "lines"),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        strip.background.x = element_blank(),
        strip.text = element_text(size = 11),
        legend.position = "none") +
  coord_flip() +
  labs(x = "", y ="Total sequences", title = "Number of 'vent-only' sequences by Supergroup & location")
## Warning: Expected 4 pieces. Additional pieces discarded in 25354 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# ?scale_fill_brewer
# head(insitu_asv_wClass)
insitu_asv_wClass %>% 
  # filter(SITE %in% selection) %>%
  filter(Domain == "Eukaryota") %>% 
  filter(!is.na(Supergroup)) %>% 
  # Average across replicates
    group_by(FeatureID, SAMPLENAME, VENT, Supergroup, Phylum, CLASS, SITE_CLASS) %>% 
    summarise(AVG = mean(value)) %>% 
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
  filter(CLASS == "Vent only") %>%
  group_by(Supergroup, SITE_CLASS) %>% 
    summarise(SEQ_SUM = sum(AVG),
     ASV_COUNT = n()) %>% 
  pivot_longer(cols = c(SEQ_SUM, ASV_COUNT)) %>%
  filter(name != "SEQ_SUM") %>%
  ggplot(aes(x = SITE_CLASS, y = value, fill = Supergroup)) +
    geom_hline(yintercept = 0) +
    geom_segment(aes(x = SITE_CLASS, xend = SITE_CLASS,
                     y = 0, yend = value, color = Supergroup),
                 lineend = "butt", size = 1) +
    geom_point(size = 2, shape = 19, aes(color = Supergroup)) +
    scale_fill_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
    scale_color_manual(values = c("#fa9fb5", "#c51b8a", "#67000d", "#ef3b2c", "#ffffcc", "#feb24c", "#c7e9b4", "#1d91c0", "#253494", "#9e9ac8", "#238b45", "#54278f", "#bdbdbd", "#252525")) +
    theme_bw() +
    facet_grid(. ~ Supergroup, scales = "free") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, color = "black", size = 11),
          axis.text.y = element_text(color = "black", size = 12),
        panel.spacing.x = unit(0, "lines"),panel.spacing.y = unit(0, "lines"),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        strip.background.x = element_blank(),
        strip.text = element_text(size = 11),
        legend.position = "none") +
  coord_flip() +
  labs(x = "", y ="Total ASVs", title = "Number of 'vent-only' ASVs by Supergroup & location")
## Warning: Expected 4 pieces. Additional pieces discarded in 25354 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# ?scale_fill_brewer

14.2.1 Isolate vent-only (putative endemic taxa)

Here, I’ve isolated almost 800,000 sequences belonging to the putative endemic ASVs (vent only), totaling to 3789 ASVs. This subset includes ASVs with 10 or more sequences (a threshold to reduce noise).

endemic <- insitu_asv_wClass %>% 
  filter(Supergroup != "Opisthokonta") %>% 
  filter(CLASS == "Vent only") %>% 
  filter(value > 9) %>% 
  filter(!is.na(Supergroup))

# Sum of putative endemic sequences and ASVs
sum(endemic$value)
## [1] 795686
length(unique(endemic$FeatureID))
## [1] 3789

Tile plot by Class level? CLR? Coord flip below and add environmental data as heatmap along side? Combine years from Axial, group by site? Do a better compilation of taxa… additional thresholds?

alv <- c("Alveolata-Ellobiopsidae", "Alveolata-Perkinsea", "Alveolata-Unknown", "Alveolata-Chrompodellids", "Alveolata-Apicomplexa")

endemic_processed <- endemic %>% 
    filter(Domain == "Eukaryota") %>% 
    filter(Supergroup != "Opisthokonta") %>% 
   mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
         Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
         Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
         Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
  mutate(SUPERGROUP = case_when(
    Supergroup %in% alv ~ "Other Alveolata",
    Supergroup == "Eukaryota_X" ~ "Unknown Eukaryota",
    Phylum == "Cercozoa" ~ "Rhizaria-Cercozoa",
    Phylum == "Radiolaria" ~ "Rhizaria-Radiolaria",
    Phylum == "Ochrophyta" ~ "Stramenopiles-Ochrophyta",
    Phylum == "Opalozoa" ~ "Stramenopiles-Opalozoa",
    Phylum == "Sagenista" ~ "Stramenopiles-Sagenista",
    TRUE ~ Supergroup
  )) %>% 
  mutate(PHYLUM = case_when(
    Phylum == "Unknown" ~ paste(SUPERGROUP, "Other"),
    grepl("_X", Phylum) ~ paste(SUPERGROUP, "Other"),
    is.na(Phylum) ~ paste(SUPERGROUP, "Other"),
    TRUE ~ Phylum
  )) %>% 
  mutate(CLASS = case_when(
    Class == "Unknown" ~ PHYLUM,
    grepl("_X", Class) ~ PHYLUM,
    is.na(Class) ~ Phylum,
    grepl("MAST-", Class) ~ "MAST",
    TRUE ~ Class
  )) %>% 
  filter(SUPERGROUP != "Archaeplastida") %>%
  # Average across replicates
      group_by(FeatureID, SAMPLENAME, VENT, Domain, SUPERGROUP, PHYLUM, CLASS, Order, Family, Genus, Species) %>% 
        summarise(AVG = mean(value)) %>% 
      ungroup() %>% 
    filter(!is.na(SUPERGROUP)) %>% 
    # Sum to the Order taxonomic classification
    unite(SAMPLENAME_2, SAMPLENAME, VENT, sep = "_") %>% 
      group_by(SAMPLENAME_2, SUPERGROUP, PHYLUM, CLASS) %>% 
        summarise(CLASS_SUM = sum(AVG)) %>% 
    unite(CLASS, SUPERGROUP, PHYLUM, CLASS, sep = "_") %>% 
    select(CLASS, SAMPLENAME_2, CLASS_SUM) %>% 
    pivot_wider(names_from = SAMPLENAME_2, values_from = CLASS_SUM, values_fill = 0) %>% 
    column_to_rownames(var = "CLASS")
# head(endemic_processed)
## Take wide data frame and CLR transform, pivot to wide, and plot
svg("tileplot-endemic-bysample.svg", h = 6, w = 20)
data.frame(compositions::clr(endemic_processed)) %>% 
      rownames_to_column(var = "CLASS") %>%
      pivot_longer(cols = starts_with(all), values_to = "CLR", names_to = "SAMPLENAME_2") %>% 
      separate(SAMPLENAME_2, c("SAMPLENAME", "VENT"), sep = "_") %>% 
      separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
      mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>% 
      mutate(REGION = case_when(
        SITE == "GordaRidge" ~ "Gorda Ridge",
        SITE %in% mcr ~ "Mid-Cayman Rise",
        SITE == "Axial" ~ "Axial")) %>% 
      mutate(VENTNAME = case_when(
        REGION == "Gorda Ridge" ~ VENT,
        REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
        REGION == "Axial" ~ VENT
        # REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
      )) %>% select(-Sample_tmp) %>% 
    unite(SAMPLE, SITE, VENTNAME, sep = " ", remove = FALSE) %>% 
    separate(CLASS, c("Supergroup", "Phylum", "Class"), sep = "_", remove = FALSE) %>%
    ggplot(aes(x = SAMPLE, y = Class, fill = CLR)) +
      geom_tile(color = "#252525") + 
      theme(legend.position = "right", 
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(), 
            panel.border = element_blank(), 
            panel.background = element_blank(),
            axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",size = 8), 
            axis.text.y = element_text(color = "black", size = 8),
            strip.background = element_blank(), 
            strip.text.y = element_text(hjust = 0, vjust = 0.5, angle = 0),
            legend.title = element_blank(),
            strip.placement = "outside") + 
      labs(x = "", y = "") +
  coord_flip() +
      # scale_fill_gradient2(low = "#4575b4", mid = "white", high = "#d73027", na.value = "grey50") +
  scale_fill_steps2(
  low = "#2166ac",
  mid = "white",
  high = "#b2182b",
  midpoint = 0,
  space = "Lab",
  na.value = "#4d4d4d",
  guide = "coloursteps",
  aesthetics = "fill"
) +
      facet_grid(SITE ~ Supergroup + Phylum, space = "free", scales = "free", switch = "both")
## Warning: Expected 4 pieces. Additional pieces discarded in 2752 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Expected 3 pieces. Additional pieces discarded in 224 rows [33, 34, 35,
## 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, ...].
dev.off()
## quartz_off_screen 
##                 2
# ?scale_fill_steps2()

14.2.2 Community composition of vent endemics

pending CLR to triangle plot? triangle plot with relative abundance - are there distinct signatures of vent endemics by region?? Are there clusters on the triangle plot??

Subset dataset to create endemic dataset and a vent inclusive dataset.

make_bar_relabun(endemic, all)

plot_grid(
  make_pca(endemic, axial),
  make_pca(endemic, mcr),
  make_pca(endemic, gr),
  make_pca(endemic, all),
  ncol = 2
)
## Warning: Expected 4 pieces. Additional pieces discarded in 2505 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Expected 4 pieces. Additional pieces discarded in 2274 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 1372 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6151 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# make_pca(endemic, all)

14.3 Add metadata heat map with tile plot from above

# colnames(endemic)
endemic_env <- function(x){
  endemic %>% 
  unite(SAMPLENAME_2, SAMPLENAME, VENT, sep = "_") %>% 
  separate(SAMPLENAME_2, c("SAMPLENAME", "VENT"), sep = "_") %>% 
      separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
      mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>% 
      mutate(REGION = case_when(
        SITE == "GordaRidge" ~ "Gorda Ridge",
        SITE %in% mcr ~ "Mid-Cayman Rise",
        SITE == "Axial" ~ "Axial")) %>% 
      mutate(VENTNAME = case_when(
        REGION == "Gorda Ridge" ~ VENT,
        REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
        REGION == "Axial" ~ VENT
      )) %>% select(-Sample_tmp) %>% 
    unite(SAMPLE, SITE, VENTNAME, sep = " ", remove = FALSE) %>%  
  select(SAMPLE, SITE, VENTNAME, DEPTH:ProkConc) %>% 
  pivot_longer(cols = TEMP:ProkConc, names_to = "MEASUREMENT", values_to = "VALUE") %>% 
  filter(MEASUREMENT == x) %>% 
    distinct() %>% 
  ggplot(aes(x = SAMPLE, y = MEASUREMENT, fill = VALUE)) +
  geom_tile() +
  coord_flip() +
  facet_grid(SITE ~ MEASUREMENT, switch = "both", space = "free", scale = "free") + theme_linedraw() +
    theme(axis.text.y = element_blank(),
          strip.text = element_blank(),
          strip.background = element_blank(),
          axis.ticks = element_blank(),
          strip.placement = "outside",
          legend.title = element_blank(),
          legend.position = "top",
          legend.text = element_text(size = 5),
          panel.background = element_blank(),
          panel.border = element_blank(),
          panel.grid = element_blank()) +
    labs(x = "", y = "") +
    viridis::scale_fill_viridis(option = "inferno")
}
# colnames(endemic)
# svg("env-heatmap.svg", w = 12, h = 4)
plot_grid(
 endemic_env("TEMP") + 
   theme(axis.text.y = element_text(color = "black"), 
         strip.text.y = element_text(color = "black"), 
         strip.placement = "outside"),
 endemic_env("PercSeawater"),
 endemic_env("pH"),
 endemic_env("Mg"),
 endemic_env("NO3"),
 endemic_env("H2"),
 endemic_env("CH4"),
 endemic_env("H2S"),
 nrow = 1,
 rel_widths = c(5,1,1,1,1,1,1,1)
)
## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

## Warning: Expected 4 pieces. Additional pieces discarded in 6252 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

# dev.off()

15 Incorporate metadata parameters.

Import grazing data as output from previous, plot with bubbles underneath of specific parameters.

15.1 Run ANOVA - MANOVA

?vegan::adonis
# distance matrix from vent-only ASV profiles and environmental parameters
#QUESTIONS
## What happens with NAs?
## Subset so I have as many samples and environmental parameters?
load("asv-tables-processed-18102021.RData", verbose = T)
## Loading objects:
##   asv_insitu
##   asv_insitu_qc
##   insitu_asv_wClass
# head(asv_insitu_qc)
# head(asv_insitu_qc %>% select(SAMPLENAME, TEMP, pH, Mg, ProkConc) %>% distinct()
colnames(insitu_asv_wClass)
##  [1] "FeatureID"         "SAMPLE"            "value"            
##  [4] "Taxon"             "Domain"            "Supergroup"       
##  [7] "Phylum"            "Class"             "Order"            
## [10] "Family"            "Genus"             "Species"          
## [13] "Consensus"         "SAMPLENAME"        "VENT"             
## [16] "COORDINATES"       "SITE"              "Sample_or_Control"
## [19] "SAMPLEID"          "DEPTH"             "SAMPLETYPE"       
## [22] "YEAR"              "TEMP"              "pH"               
## [25] "PercSeawater"      "Mg"                "NO3"              
## [28] "H2"                "H2S"               "CH4"              
## [31] "ProkConc"          "Sample_actual"     "Type"             
## [34] "DATASET"           "DECONTAM"          "CLASS"            
## [37] "SITE_CLASS"
vent_metadata <- insitu_asv_wClass %>% 
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, SITE, VENTNAME, sep = "_", remove = FALSE) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "_", replacement = " ")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Axial", replacement = "Axial")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Plume ", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = " BSW", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background Axial Deep seawater 2015", replacement = "Background Axial 2015")) %>%
    pivot_longer(cols = TEMP:ProkConc, names_to = "MEASUREMENT", values_to = "VALUE") %>% 
      group_by(SAMPLE, MEASUREMENT) %>% 
      summarise(MEAN = mean(VALUE)) %>%
      distinct() %>% 
      pivot_wider(names_from = "MEASUREMENT", values_from = "MEAN") %>%
  column_to_rownames(var = "SAMPLE")
## Warning: Expected 4 pieces. Additional pieces discarded in 26125 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
vent_asvs <- insitu_asv_wClass %>% 
  filter(Domain == "Eukaryota") %>% 
  # Average across replicates
  separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
  mutate(REGION = case_when(
    SITE == "GordaRidge" ~ "Gorda Ridge",
    SITE %in% mcr ~ "Mid-Cayman Rise",
    SITE == "Axial" ~ "Axial")) %>% 
  mutate(VENTNAME = case_when(
    REGION == "Gorda Ridge" ~ VENT,
    REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
    REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
  )) %>% select(-Sample_tmp) %>% 
  unite(SAMPLE, SAMPLETYPE, SITE, VENTNAME, sep = "_", remove = FALSE) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "_", replacement = " ")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Axial", replacement = "Axial")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Plume ", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Vent Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background GordaRidge", replacement = "Gorda Ridge")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "VonDamm VonDamm", replacement = "Von Damm")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Piccard Piccard", replacement = "Piccard")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = " BSW", replacement = "")) %>%
      mutate(SAMPLE = gsub(SAMPLE, pattern = "Background Axial Deep seawater 2015", replacement = "Background Axial 2015")) %>%
  group_by(SAMPLE, FeatureID) %>% 
  summarise(MEAN = mean(value)) %>% 
  ungroup() %>%
  pivot_wider(names_from = "SAMPLE", values_from = "MEAN", values_fill = 0) %>% 
  column_to_rownames(var = "FeatureID")
## Warning: Expected 4 pieces. Additional pieces discarded in 26125 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# head(vent_asvs)
# ?vegdist()
# vent_dist <- vegdist(vent_asvs, method = "jaccard")
# class(vent_dist)
# tmp <- adonis(vent_dist ~ TEMP*ProkConc, data = vent_metadata, permutations = 99)
# head(vent_asvs)
# adonis(dune ~ Management * A1, data = dune.env, permutations = 99)
# class(dune.env)
# ?adonis()

15.2 Temperature & prokaryote cell concentration

Plot sequence relative abundance by temperature and prokaryote concentration. These two parameters were chosen because I have the most metadata from them. If a sample was not countable or had no temperature record, it was removed.

15.2.1 Plot as a factor of relative sequence abundance

asv_insitu_qc %>% 
    # filter(SITE %in% selection) %>% 
    filter(!is.na(TEMP)) %>% 
    filter(!is.na(ProkConc)) %>% 
    filter(Domain == "Eukaryota") %>%
    mutate(Supergroup = ifelse(is.na(Supergroup), "Unknown Eukaryota", Supergroup),
           Phylum = ifelse(is.na(Phylum), "Unknown", Phylum),
           Phylum = ifelse(Phylum == "Alveolata_X", "Ellobiopsidae", Phylum),
           Supergroup = ifelse(Supergroup == "Alveolata", paste(Supergroup, Phylum, sep = "-"), Supergroup)) %>% 
    group_by(FeatureID, Taxon, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species,
             VENT, SITE, SAMPLETYPE, YEAR, DATASET, TEMP, ProkConc) %>% 
    summarise(SEQ_AVG_REP = mean(value)) %>% 
    ungroup() %>% 
    unite(SupergroupPhylum, Supergroup, Phylum, sep = "-") %>% 
    group_by(SITE, SAMPLETYPE, VENT, YEAR, SupergroupPhylum, TEMP, ProkConc) %>% 
      summarise(SEQ_SUM = sum(SEQ_AVG_REP)) %>% 
    ggplot(aes(x = ProkConc, y = as.numeric(TEMP), fill = SITE, shape = SAMPLETYPE)) +
      geom_point(color = "black", aes(size = SEQ_SUM)) +
      scale_size_continuous(range = c(4,9)) +
      scale_shape_manual(values = c(21, 23, 24)) +
      scale_x_log10() +
      facet_wrap(SupergroupPhylum ~., scale = "free") +
      theme_linedraw() +
      theme(axis.text = element_text(color = "black", size = 12),
          strip.background = element_blank(), strip.text = element_text(color = "black"),
          legend.position = "right") +
    # scale_y_continuous(expand = c(0,0)) +
    scale_fill_manual(values = c("#feb24c", "#addd8e", "#de2d26", "#1c9099")) +
    guides(fill = guide_legend(override.aes = list(shape = c(21))),
           shape = guide_legend(override.aes = list(fill = "black"))) +
    labs(x = bquote("Cells "~mL^-1 ~hr^-1), y = "Temperature (C)")

15.2.2 Plot as a factor of CLR transformed data

Repeat above plot, but with CLR transformed data.

df_wide_tmp <- asv_insitu_qc %>% 
    filter(!is.na(TEMP)) %>% 
    filter(!is.na(ProkConc)) %>% 
    filter(Domain == "Eukaryota") %>% 
    filter(value > 0) %>% 
  # Average across replicates
      group_by(FeatureID, SAMPLENAME, VENT, Domain, Supergroup, Phylum, Class, Order, Family, Genus, Species, TEMP, ProkConc) %>% 
        summarise(AVG = mean(value)) %>% 
      ungroup() %>% 
    # Sum to the Order taxonomic classification
    unite(SAMPLENAME_2, SAMPLENAME, VENT, TEMP, ProkConc, sep = "_") %>% 
    unite(TAX, FeatureID, Supergroup, Phylum, sep = " ") %>% 
    select(TAX, SAMPLENAME_2, AVG) %>% 
    pivot_wider(names_from = SAMPLENAME_2, values_from = AVG, values_fill = 0) %>% 
    column_to_rownames(var = "TAX")

  ## Take wide data frame and CLR transform, pivot to wide, and plot
  clr_long_df <- data.frame(compositions::clr(df_wide_tmp)) %>% 
      rownames_to_column(var = "TAX") %>%
      pivot_longer(cols = starts_with(all), values_to = "CLR", names_to = "SAMPLENAME_2") %>% 
      separate(SAMPLENAME_2, c("SAMPLENAME", "VENT", "TEMP", "ProkConc"), sep = "_") %>% 
      separate(SAMPLENAME, c("SITE", "SAMPLETYPE", "YEAR", "Sample_tmp"), remove = TRUE) %>% 
      mutate(VENT = str_replace_all(VENT, "\\.", " ")) %>% 
      mutate(REGION = case_when(
        SITE == "GordaRidge" ~ "Gorda Ridge",
        SITE %in% mcr ~ "Mid-Cayman Rise",
        SITE == "Axial" ~ "Axial")) %>% 
      mutate(VENTNAME = case_when(
        REGION == "Gorda Ridge" ~ VENT,
        REGION == "Mid-Cayman Rise" ~ paste(SITE, VENT, sep = " "),
        REGION == "Axial" ~ paste(VENT, YEAR, sep = " ")
      )) %>% select(-Sample_tmp) %>% 
    unite(SAMPLE, REGION, VENTNAME, sep = " ", remove = FALSE) %>% 
    separate(TAX, c("ASVid","Supergroup", "Phylum"), sep = " ", remove = TRUE) %>% 
    unite(SupergroupPhylum, Supergroup, Phylum, sep = "-")
## Warning: Expected 4 pieces. Additional pieces discarded in 173718 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
  # head(clr_long_df)
  ## Plot
  clr_long_df %>% 
    filter(SAMPLETYPE == "Vent") %>% 
    ggplot(aes(x = as.numeric(ProkConc), y = as.numeric(TEMP), fill = CLR, shape = REGION)) +
      geom_point(color = "black", size = 3, aes(fill = CLR, shape = REGION)) +
      scale_fill_gradient2(low = "#4575b4", mid = "white", high = "#d73027", na.value = "grey50") +
      scale_shape_manual(values = c(21, 23, 24, 25)) +
      scale_x_log10() +
      facet_wrap(SupergroupPhylum ~ ., scale = "free") +
      theme_linedraw() +
      theme(axis.text = element_text(color = "black", size = 12),
          strip.background = element_blank(), strip.text = element_text(color = "black"),
          legend.position = "right") +
    guides(fill = guide_legend(override.aes = list(shape = c(21))),
           shape = guide_legend(override.aes = list(fill = "black"))) +
    labs(x = bquote("Cells "~mL^-1 ~hr^-1), y = "Temperature (C)")

15.3 Grazing rate (Gorda Ridge & MCR only)

pending

16 HPC analysis code below

Diversity model/estimation and network analysis to be run on HPC.

load("asv-tables-processed-18102021.RData", verbose = T)

17 Diversity estimators

DivNet package - diversity estimation hypothesis testing from Amy Willis’s group. This will also characterize the uncertainty of the richness estimate. Richness estimation is flawed because of sample depth and processing methods.

library(phyloseq); library(breakaway); library(DivNet)
library(tidyverse)

This code block run on HPC.

# Select eukaryotes only and create wide format dataframe
insitu_wide <- asv_insitu_qc %>% 
  filter(Domain == "Eukaryota") %>% 
  filter(!grepl("_Plume001_", SAMPLE)) %>% #removing "near vent background", not relevant in other data sets
  select(FeatureID, Taxon, SAMPLE, value) %>%
  pivot_wider(names_from = SAMPLE, values_from = value, values_fill = 0)

# head(insitu_wide)
insitu_samples <- as.character(colnames(insitu_wide %>% select(-Taxon, -FeatureID)))
# insitu_samples
insitu_tax_matrix <- insitu_wide %>%
  select(FeatureID, Taxon) %>% 
  separate(Taxon, c("Domain", "Supergroup", 
                  "Phylum", "Class", "Order",
                  "Family", "Genus", "Species"), sep = ";") %>% 
  column_to_rownames(var = "FeatureID") %>% 
  as.matrix
## Warning: Expected 8 pieces. Additional pieces discarded in 6222 rows [3, 4, 6,
## 7, 9, 10, 11, 12, 15, 17, 18, 20, 22, 23, 24, 25, 27, 28, 29, 32, ...].
## Warning: Expected 8 pieces. Missing pieces filled with `NA` in 4264 rows [1, 2,
## 5, 8, 13, 14, 16, 19, 21, 26, 30, 31, 33, 40, 41, 45, 46, 47, 48, 50, ...].
insitu_asv_matrix <- insitu_wide %>% 
  select(-Taxon) %>% 
  column_to_rownames(var = "FeatureID") %>% 
  as.matrix

# Align row names for each matrix
rownames(insitu_tax_matrix) <- row.names(insitu_asv_matrix)

## Extract relevant metadata information
# head(metadata)
metadata_insitu <- metadata %>%
  filter(SAMPLE %in% insitu_samples) %>% # from reformatting df above
  select(SAMPLE, VENT, SITE, SAMPLETYPE, YEAR) %>%
  unite(SAMPLELABEL, VENT, SITE, SAMPLETYPE, YEAR, sep = "_", remove = FALSE) %>% 
  unite(TYPE_SITE, SITE, SAMPLETYPE, sep = "_", remove = FALSE)
rownames(metadata_insitu) <- metadata_insitu$SAMPLE
# View(metadata_insitu)
# head(metadata_insitu)
# row.names(metadata_insitu)

Import taxa and ASV count matrices into phyloseq objects.

# Import asv and tax matrices
ASV = otu_table(insitu_asv_matrix, taxa_are_rows = TRUE)
TAX = tax_table(insitu_tax_matrix)

phylo_obj <- phyloseq(ASV, TAX)

# Import metadata as sample data in phyloseq
samplenames <- sample_data(metadata_insitu)

# join as phyloseq object
physeq_insitu = merge_phyloseq(phylo_obj, samplenames)

## Check
physeq_insitu
# head(insitu_tax_matrix)
# head(metadata_insitu)
# ?divnet()
# Glom tax levels at the Order level, then perform divnet analysis
order_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), base = 3)
order_divnet_label <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "SAMPLELABEL", base = 3)
# Vent vs plume vs background
order_divnet_TYPE <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "SAMPLETYPE", base = 3)
# location and vent vs plume vs background
order_divnet_TYPE_SITE <- divnet(tax_glom(physeq_insitu, taxrank = "Order"), X = "TYPE_SITE", base = 3)

save(order_divnet, order_divnet_label, order_divnet_TYPE, order_divnet_TYPE_SITE, file = "ORDER.Rdata")

###
fam_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Family"), base = 3)
fam_divnet_label <- divnet(tax_glom(physeq_insitu, taxrank = "Family"), X = "SAMPLELABEL", base = 3)
# Vent vs plume vs background
fam_divnet_TYPE <- divnet(tax_glom(physeq_insitu, taxrank = "Family"), X = "SAMPLETYPE", base = 3)
# location and vent vs plume vs background
fam_divnet_TYPE_SITE <- divnet(tax_glom(physeq_insitu, taxrank = "Family"), X = "TYPE_SITE", base = 3)
save(fam_divnet, fam_divnet_label, fam_divnet_TYPE, fam_divnet_TYPE_SITE, file = "FAMILY.Rdata")

###
gen_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Genus"), base = 3)
gen_divnet_label <- divnet(tax_glom(physeq_insitu, taxrank = "Genus"), X = "SAMPLELABEL", base = 3)
# Vent vs plume vs background
gen_divnet_TYPE <- divnet(tax_glom(physeq_insitu, taxrank = "Genus"), X = "SAMPLETYPE", base = 3)
# location and vent vs plume vs background
gen_divnet_TYPE_SITE <- divnet(tax_glom(physeq_insitu, taxrank = "Genus"), X = "TYPE_SITE", base = 3)
save(gen_divnet, gen_divnet_label, gen_divnet_TYPE, gen_divnet_TYPE_SITE, file = "GENUS.Rdata")

###
spp_divnet <- divnet(tax_glom(physeq_insitu, taxrank = "Species"), base = 3)
spp_divnet_label <- divnet(tax_glom(physeq_insitu, taxrank = "Species"), X = "SAMPLELABEL", base = 3)
# Vent vs plume vs background
spp_divnet_TYPE <- divnet(tax_glom(physeq_insitu, taxrank = "Species"), X = "SAMPLETYPE", base = 3)
# location and vent vs plume vs background
spp_divnet_TYPE_SITE <- divnet(tax_glom(physeq_insitu, taxrank = "Species"), X = "TYPE_SITE", base = 3)
save(spp_divnet, spp_divnet_label, spp_divnet_TYPE, spp_divnet_TYPE_SITE, file = "SPECIES.Rdata")

Above run on HPC and RData files save so we can look at various levels of species richness.

Function to extract shannon and simpson data from each divnet output.

# ?pivot_longer()
fxn_extract_divet <- function(df){
  df$shannon %>% summary %>% 
  pivot_longer(cols = starts_with("estimate"), names_to = "ESTIMATE-shannon", values_to = "Shannon") %>%
  pivot_longer(cols = starts_with("error"), names_to = "ERROR-shannon", values_to = "Shannon-error") %>%
  pivot_longer(cols = starts_with("lower"), names_to = "LOWER-shannon", values_to = "Shannon-lower") %>%
  pivot_longer(cols = starts_with("upper"), names_to = "UPPER-shannon", values_to = "Shannon-upper") %>%
  left_join(df$simpson %>% summary %>%
      pivot_longer(cols = starts_with("estimate"), names_to = "ESTIMATE-simpson", values_to = "Simpson") %>%
      pivot_longer(cols = starts_with("error"), names_to = "ERROR-simpson", values_to = "Simpson-error") %>%
      pivot_longer(cols = starts_with("lower"), names_to = "LOWER-simpson", values_to = "Simpson-lower") %>%
      pivot_longer(cols = starts_with("upper"), names_to = "UPPER-simpson", values_to = "Simpson-upper"),
    by = c("sample_names" = "sample_names")) %>%
  left_join(metadata_insitu %>% rownames_to_column(var = "sample_names")) %>%
  select(-sample_names, -ends_with("-simpson"), -ends_with("-shannon"), -starts_with("model."), -starts_with("name.")) %>%
  distinct()
}

Function to create plots

plot_sampletype <- function(df){
  plot_grid(df %>%
  # ggplot(aes(x = VENT, y = Shannon)) +
  ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
  # geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
    geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
  # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
    geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
    geom_jitter(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
    scale_fill_manual(values = c("#ffffff", "#969696", "#252525")) +
    # scale_fill_brewer(palette = "Set2") +
  theme_linedraw() +
  theme(axis.text.y = element_text(size = 14),
        axis.text.x = element_blank(),
        strip.background = element_blank(),
        strip.text = element_text(color = "black"),
        legend.position = "none",
        axis.ticks.x = element_blank()) +
  labs(x = "", y = "Shannon"),
  df %>%
  # ggplot(aes(x = VENT, y = Simpson)) +
    ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
  # geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
    # geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
  # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
    geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
    geom_jitter(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
    scale_fill_manual(values = c("#ffffff", "#969696", "#252525")) +
    # scale_fill_brewer(palette = "Set2") +
  theme_linedraw() +
  theme(axis.text.x = element_text(vjust = 1, hjust = 0.5, size = 14),
        axis.text = element_text(size = 14),
        strip.background = element_blank(),
        strip.text = element_blank(),
        legend.title = element_blank(),
        legend.position = "bottom") +
  labs(x = "", y = "Simpson"),
  ncol = 1, axis = c("lrt"), align = c("vh"))
}
load("data-input/ORDER.Rdata", verbose = T)
## Loading objects:
##   order_divnet
##   order_divnet_label
##   order_divnet_TYPE
##   order_divnet_TYPE_SITE
order_alpha_18s <- fxn_extract_divet(order_divnet)
order_alpha_label <- fxn_extract_divet(order_divnet_label)
order_alpha_TYPE <- fxn_extract_divet(order_divnet_TYPE)
order_alpha_TYPE_SITE <- fxn_extract_divet(order_divnet_TYPE_SITE)
plot_grid(plot_sampletype(order_alpha_18s),
          plot_sampletype(order_alpha_label),
          ncol = 2)

load("data-input/FAMILY.Rdata", verbose = T)
## Loading objects:
##   fam_divnet
##   fam_divnet_label
##   fam_divnet_TYPE
##   fam_divnet_TYPE_SITE
fam_alpha_18s <- fxn_extract_divet(fam_divnet)
fam_alpha_label <- fxn_extract_divet(fam_divnet_label)
fam_alpha_TYPE <- fxn_extract_divet(fam_divnet_TYPE)
fam_alpha_TYPE_SITE <- fxn_extract_divet(fam_divnet_TYPE_SITE)
plot_grid(plot_sampletype(fam_alpha_18s),
          plot_sampletype(fam_alpha_label),
          ncol = 2)

load("data-input/GENUS.Rdata", verbose = T)
## Loading objects:
##   gen_divnet
##   gen_divnet_label
##   gen_divnet_TYPE
##   gen_divnet_TYPE_SITE
gen_alpha_18s <- fxn_extract_divet(gen_divnet)
gen_alpha_label <- fxn_extract_divet(gen_divnet_label)
gen_alpha_TYPE <- fxn_extract_divet(gen_divnet_TYPE)
gen_alpha_TYPE_SITE <- fxn_extract_divet(gen_divnet_TYPE_SITE)
plot_grid(plot_sampletype(gen_alpha_18s),
          plot_sampletype(gen_alpha_label),
          ncol = 2)

plot_sampletype(gen_alpha_label)

load("data-input/SPECIES.Rdata", verbose = T)
## Loading objects:
##   spp_divnet
##   spp_divnet_label
##   spp_divnet_TYPE
##   spp_divnet_TYPE_SITE
spp_alpha_18s <- fxn_extract_divet(spp_divnet)
spp_alpha_label <- fxn_extract_divet(spp_divnet_label)
spp_alpha_TYPE <- fxn_extract_divet(spp_divnet_TYPE)
spp_alpha_TYPE_SITE <- fxn_extract_divet(spp_divnet_TYPE_SITE)
plot_grid(plot_sampletype(spp_alpha_18s),
          plot_sampletype(spp_alpha_label),
          ncol = 2)

plot_sampletype(spp_alpha_18s)

testDiversity(spp_divnet_TYPE_SITE, "shannon")
## Hypothesis testing:
##   p-value for global test: 0
##                                  Estimates Standard Errors p-values
## (Intercept)                      2.6026981      0.04265830    0.000
## predictorsAxial_Plume            0.5741397      1.28560932    0.655
## predictorsAxial_Vent             1.1618210      0.08980834    0.000
## predictorsGordaRidge_Background  0.1571173      0.27974388    0.574
## predictorsGordaRidge_Plume       1.0589493      0.27181306    0.000
## predictorsGordaRidge_Vent        0.7711650      0.05643055    0.000
## predictorsPiccard_Background     0.6836154      1.10179528    0.535
## predictorsPiccard_Plume          0.7325748      0.28549380    0.010
## predictorsPiccard_Vent           1.0182122      0.26806101    0.000
## predictorsVonDamm_Background     0.5959967      0.33149807    0.072
## predictorsVonDamm_Plume          1.1301007      0.45120741    0.012
## predictorsVonDamm_Vent          -0.4697728      0.15749319    0.003

17.1 Plot species richness for different sample types

17.1.1 All samples

head(order_alpha_18s)

Save for presentation

# svg("Shannon-violin-plot.svg",)
# order_alpha_18s %>% 
#   # ggplot(aes(x = VENT, y = Shannon)) +
#   ggplot(aes(x = SAMPLETYPE, y = Shannon, group = SAMPLETYPE)) +
#   # geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
#     # geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
#   # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
#     geom_violin(aes(fill = SAMPLETYPE), color = "#525252", alpha = 0.5, width = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +   
#     geom_jitter(shape = 21, color = "#525252", size = 3, aes(fill = SAMPLETYPE)) +
#     # scale_fill_brewer(palette = "Set2") +
#   scale_fill_manual(values = c("#1c9099", "#fd8d3c", "#f768a1")) + 
#   theme_linedraw() +
#   theme(axis.text.x = element_text(vjust = 1, hjust = 0.5, size = 14),
#         axis.text = element_text(size = 14),
#         strip.background = element_blank(),
#         strip.text = element_blank(),
#         legend.title = element_blank(),
#         legend.position = "bottom") +
#   labs(x = "", y = "Shannon")
# dev.off()
# head(order_alpha_TYPE)
# plot_grid(order_alpha_TYPE %>% 
#     select(-SAMPLELABEL, -VENT, -SITE, -TYPE_SITE, -YEAR) %>% 
#     distinct() %>% 
#   ggplot(aes(x = SAMPLETYPE, y = Shannon)) +
#   geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
#     geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
#   # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
#     scale_fill_brewer(palette = "Set2") +
#   theme_linedraw() +
#   theme(axis.text.x = element_blank(),
#         strip.background = element_blank(),
#         strip.text = element_text(color = "black"),
#         legend.position = "none",
#         axis.ticks = element_blank()) +
#   labs(x = "", y = "Shannon"),
#   order_alpha_TYPE %>% 
#     select(-SAMPLELABEL, -VENT, -SITE, -TYPE_SITE, -YEAR) %>% 
#     distinct() %>% 
#   ggplot(aes(x = SAMPLETYPE, y = Simpson)) +
#   geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
#     geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
#   # facet_grid(. ~ SITE + SAMPLETYPE + YEAR, space = "free_x", scales = "free_x") +
#     scale_fill_brewer(palette = "Set2") +
#   theme_linedraw() +
#   theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
#         strip.background = element_blank(),
#         strip.text = element_blank(),
#         legend.title = element_blank(),
#         legend.position = "bottom") +
#   labs(x = "", y = "Simpson"),
#   ncol = 1, axis = c("lr"), align = c("v"))
# head(order_alpha_TYPE_SITE)
# plot_grid(order_alpha_TYPE_SITE %>% 
#     select(-SAMPLELABEL, -YEAR, -VENT) %>% 
#     distinct() %>% 
#   ggplot(aes(x = SAMPLETYPE, y = Shannon)) +
#   geom_errorbar(aes(ymin = `Shannon-lower`, ymax = `Shannon-upper`), color = "#525252", width = 0.2) +
#     geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
#   facet_grid(. ~ SITE, space = "free_x", scales = "free_x") +
#     scale_fill_brewer(palette = "Set2") +
#   theme_linedraw() +
#   theme(axis.text.x = element_blank(),
#         strip.background = element_blank(),
#         strip.text = element_text(color = "black"),
#         legend.position = "none",
#         axis.ticks = element_blank()) +
#   labs(x = "", y = "Shannon"),
#   order_alpha_TYPE_SITE %>% 
#     select(-SAMPLELABEL, -YEAR, -VENT) %>% 
#     distinct() %>% 
#   ggplot(aes(x = SAMPLETYPE, y = Simpson)) +
#   geom_errorbar(aes(ymin = `Simpson-lower`, ymax = `Simpson-upper`), color = "#525252", width = 0.2) +
#     geom_point(shape = 21, color = "#525252", size = 2, aes(fill = SAMPLETYPE)) +
#   facet_grid(. ~ SITE, space = "free_x", scales = "free_x") +
#     scale_fill_brewer(palette = "Set2") +
#   theme_linedraw() +
#   theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
#         strip.background = element_blank(),
#         strip.text = element_blank(),
#         legend.title = element_blank(),
#         legend.position = "bottom") +
#   labs(x = "", y = "Simpson"),
#   ncol = 1, axis = c("lr"), align = c("v"))

18 Network analysis

If not already done, re-import working dataframes:

load("asv-tables-processed-18102021.RData", verbose = T)
library(phyloseq)
library(SpiecEasi)
# Select eukaryotes only and create wide format dataframe
insitu_wide <- asv_insitu_qc %>%
  filter(Domain == "Eukaryota") %>%
  filter(!grepl("_Plume001_", SAMPLE)) %>% #removing "near vent background", not relevant in other data sets
  select(FeatureID, Taxon, SAMPLE, value) %>%
  pivot_wider(names_from = SAMPLE, values_from = value, values_fill = 0)

# head(insitu_wide)
insitu_samples <- as.character(colnames(insitu_wide %>% select(-Taxon, -FeatureID)))

# make matrices for phyloseq
insitu_tax_matrix <- insitu_wide %>%
  select(FeatureID, Taxon) %>%
  separate(Taxon, c("Domain", "Supergroup",
                  "Phylum", "Class", "Order",
                  "Family", "Genus", "Species"), sep = ";") %>%
  column_to_rownames(var = "FeatureID") %>%
  as.matrix

insitu_asv_matrix <- insitu_wide %>%
  select(-Taxon) %>%
  column_to_rownames(var = "FeatureID") %>%
  as.matrix

# Align row names for each matrix
rownames(insitu_tax_matrix) <- row.names(insitu_asv_matrix)

metadata_insitu <- metadata %>%
  filter(SAMPLE %in% insitu_samples) %>% # from reformatting df above
  select(SAMPLE, VENT, SITE, SAMPLETYPE, YEAR) %>%
  unite(SAMPLELABEL, VENT, SITE, SAMPLETYPE, YEAR, sep = "_", remove = FALSE) %>%
  unite(TYPE_SITE, SITE, SAMPLETYPE, sep = "_", remove = FALSE)

rownames(metadata_insitu) <- metadata_insitu$SAMPLE

# Import asv and tax matrices
ASV = otu_table(insitu_asv_matrix, taxa_are_rows = TRUE)
TAX = tax_table(insitu_tax_matrix)

phylo_obj <- phyloseq(ASV, TAX)

# Import metadata as sample data in phyloseq
samplenames <- sample_data(metadata_insitu)

## Check
physeq_insitu

Run SPIEC-EASI with phyloseq object.

# Run spiec easi with glasso
spec_glasso_microeuk <- spiec.easi(physeq_insitu, method = 'glasso', lambda.min.ratio=1e-2, nlambda=20, pulsar.params = list(rep.num=50))

# save(spec_glasso_microeuk, file = "spiec-easi-output-24-11-21.RData")

18.1

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS/LAPACK: /Users/sarahhu/anaconda3/envs/r_4.1/lib/libopenblasp-r0.3.15.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DivNet_0.3.7       breakaway_4.7.3    ggdendro_0.1.22    treemapify_2.5.5  
##  [5] vegan_2.5-7        lattice_0.20-44    permute_0.9-5      viridis_0.6.1     
##  [9] viridisLite_0.4.0  plotly_4.10.0      gt_0.3.1           ggupset_0.3.0     
## [13] patchwork_1.1.1    compositions_2.0-2 decontam_1.12.0    phyloseq_1.36.0   
## [17] cowplot_1.1.1      forcats_0.5.1      stringr_1.4.0      dplyr_1.0.7       
## [21] purrr_0.3.4        readr_2.0.0        tidyr_1.1.3        tibble_3.1.3      
## [25] ggplot2_3.3.5      tidyverse_1.3.1   
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4            colorspace_2.0-2       ellipsis_0.3.2        
##   [4] XVector_0.32.0         fs_1.5.0               rstudioapi_0.13       
##   [7] farver_2.1.0           ggfittext_0.9.1        bit64_4.0.5           
##  [10] fansi_0.5.0            lubridate_1.7.10       xml2_1.3.2            
##  [13] codetools_0.2-18       splines_4.1.0          doParallel_1.0.16     
##  [16] robustbase_0.93-8      knitr_1.33             ade4_1.7-17           
##  [19] jsonlite_1.7.2         nloptr_1.2.2.2         broom_0.7.9           
##  [22] cluster_2.1.2          dbplyr_2.1.1           compiler_4.1.0        
##  [25] httr_1.4.2             backports_1.2.1        assertthat_0.2.1      
##  [28] Matrix_1.3-4           fastmap_1.1.0          lazyeval_0.2.2        
##  [31] cli_3.0.1              htmltools_0.5.2        tools_4.1.0           
##  [34] igraph_1.2.6           gtable_0.3.0           glue_1.4.2            
##  [37] GenomeInfoDbData_1.2.6 reshape2_1.4.4         Rcpp_1.0.7            
##  [40] Biobase_2.52.0         cellranger_1.1.0       jquerylib_0.1.4       
##  [43] vctrs_0.3.8            Biostrings_2.60.1      rhdf5filters_1.4.0    
##  [46] multtest_2.48.0        ape_5.5                nlme_3.1-152          
##  [49] iterators_1.0.13       tensorA_0.36.2         xfun_0.24             
##  [52] lme4_1.1-27.1          rvest_1.0.1            lifecycle_1.0.0       
##  [55] DEoptimR_1.0-9         zlibbioc_1.38.0        MASS_7.3-54           
##  [58] scales_1.1.1           vroom_1.5.4            hms_1.1.0             
##  [61] parallel_4.1.0         biomformat_1.20.0      rhdf5_2.36.0          
##  [64] RColorBrewer_1.1-2     mvnfast_0.2.7          yaml_2.2.1            
##  [67] gridExtra_2.3          sass_0.4.0             stringi_1.7.4         
##  [70] highr_0.9              S4Vectors_0.30.0       foreach_1.5.1         
##  [73] BiocGenerics_0.38.0    boot_1.3-28            GenomeInfoDb_1.28.1   
##  [76] rlang_0.4.11           pkgconfig_2.0.3        bitops_1.0-7          
##  [79] evaluate_0.14          Rhdf5lib_1.14.2        labeling_0.4.2        
##  [82] htmlwidgets_1.5.3      bit_4.0.4              tidyselect_1.1.1      
##  [85] plyr_1.8.6             magrittr_2.0.1         R6_2.5.0              
##  [88] IRanges_2.26.0         generics_0.1.0         DBI_1.1.1             
##  [91] pillar_1.6.2           haven_2.4.3            withr_2.4.2           
##  [94] mgcv_1.8-36            abind_1.4-5            survival_3.2-11       
##  [97] RCurl_1.98-1.3         bayesm_3.1-4           modelr_0.1.8          
## [100] crayon_1.4.1           utf8_1.2.2             tzdb_0.1.2            
## [103] rmarkdown_2.9          grid_4.1.0             readxl_1.3.1          
## [106] data.table_1.14.0      reprex_2.0.1           digest_0.6.27         
## [109] stats4_4.1.0           munsell_0.5.0          bslib_0.3.0